UNIX™ System V — Release 2.0 

Administrator Guide 
DEC™ Processors 



UNIX is a trademark of AT&T Bell Laboratories 
DEC is a trademark of Digital Equipment Corporation 


Copyright © 1984 AT&T Technologies 
All Rights Reserved 
Printed in U.S.A. 



DEC, MASSBUS, PDP, UNIBUS, and VAX are trademarks of Digital 
Equipment Corporation. 

DIABLO is a registered trademark of Xerox Corporation. 

UNIX is a trademark of AT&T Bell Laboratories. 



CONTENTS 


Chapter 

1 . 

INTRODUCTION 

Chapter 

2. 

ADMINISTRATIVE ADVICE 
(DEC PROCESSORS) 

Chapter 

3. 

SETTING UP THE UNIX 
SYSTEM (DEC PROCESSORS) 

Chapter 

4. 

AUTO CALL FACILITY 
INSTALLATION (DEC 
PROCESSORS) 

Chapter 

5. 

UNIX SYSTEM ACCOUNTING 

Chapter 

6. 

FILE SYSTEM CHECKING 

Chapter 

7. 

LP SPOOLING SYSTEM 

Chapter 

8. 

UNIX SYSTEM REMOTE JOB 
ENTRY 

Chapter 

9. 

UNIX SYSTEM ACTIVITY 
PACKAGE 

Chapter 

10. 

UUCP ADMINISTRATION 


“ 1 - 




Chapter 1 

INTRODUCTION 

The Administrator Guide is a reference volume for those who 
administer a UNIX* system on DECf processors. The guide should 
be used to supplement the information contained in the UNIX System 
V User Reference Manual, the UNIX System V Programmer 
Reference Manual, and the UNIX System V Administrator Reference 
Manual. The following paragraphs contain a brief description of each 
chapter of the guide. 


The chapter “ADMINISTRATIVE ADVICE (DEC PROCESSORS)” 
contains helpful advice and suggestions for administrators of the 
UNIX system on DEC processors. 

The chapter “SETTING UP THE UNIX SYSTEM (DEC 
PROCESSORS)” describes the setup procedures for installing a UNIX 
operating system on DEC processors. 

The chapter “AUTO CALL FACILITY INSTALLATION (DEC 
PROCESSORS)” outlines the installation procedures for a properly 
installed (software) automatic call-up facility. 

The chapter “UNIX SYSTEM ACCOUNTING” descibes the structure, 
implementation, and management of the accounting system. 

The chapter “FILE SYSTEM CHECKING” describes the file system 
check program (fsck) of the UNIX system. Fsck audits and 
interactively repairs inconsistency in the file system. 


The chapter “LP SPOOLING SYSTEM” defines the LP program and 
describes the role of the LP administrator in performing restricted 
functions and overseeing the smooth operation of LP. 


* Trademark of AT&T Bell Laboratories 
f Trademark of Digital Equipment Corp. 


1-1 



INTRODUCTION 


The chapter “UNIX SYSTEM REMOTE JOB ENTRY” defines the 
UNIX system remote job entry (RJE) and describes the 
administrative duties of an RJE administrator. 

The chapter “UNIX SYSTEM ACTIVITY PACKAGE” describes the 
design and implementation of the UNIX system activity package. 
The package reports UNIX system-wide statistics. 


The chapter UUCP ADMINISTRATION” describes how a uucp 
network is set up, the format of the control files, and administrative 
procedures. 


1-2 



Chapter 2 

ADMINISTRATIVE ADVICE 
(DEC PROCESSORS) 

GENERAL 

The information contained in this chapter is relative to the following 
DEC processors: 

• Digital Equipment Corp. VAX*-ll/780, -11/750 

• Digital Equipment Corp. PDP*-ll/70. 


ADMINISTRATOR S ROAD MAP 

This chapter contains administrative advice based on the experience 
and suggestions of many system administrators. Other reasonable 
approaches may be taken to solve many of the problem areas 
described. 

Getting started as a UNIX system administrator is hard work. There 
are no real shortcuts to a working knowledge of the system. The 
system administrator will need time for reading, studying, and 
hands-on experimenting. The system administrator should not go 
“live” with the system until he/she have had several weeks to learn 
the job and get the initial hardware quirks ironed out. 

The administrator should be familiar with a lot of the distributed 
documentation. The “Introduction” and “How to Get Started” 
sections of the UNIX System V User Reference Manual as well as all 
of the sections of the UNIX System V Administrator Reference 
Manual should be studied. 


* Trademark of Digital Equipment Corp. 


2-1 



ADMINISTRATIVE ADVICE 


Throughout this chapter, each reference of the form name(lM), 
name(7), or name(8) refers to entries in the UNIX System V 
Administrator Reference Manual. References to entries of the form 
name(N), where " N" is the number 1 or 6 possibly followed by a 
letter, refer to entry name in section N of the UNIX System V User 
Reference Manual. If " N" is a number 2 through 5 possibly followed 
by a letter, refer to entry name in section N of the UNIX System V 
Programmer Reference Manual. 


In these manuals, pay special attention to: acct(lM), checkall(lM), 
chmod(l), chown(l), config(lM), cpio(l), date(l), dcopy(lM), 
df(lM), don(lM), du(l), ed(l), env(l), errpt(lM), find(l), 
format(lM), fsck(lM), fuser(lM), kill(l), mail(l), mkdir(l), 
mkfs(lM), ncheck(lM), ps(l), rm(l), rmdir(l), shutdown(lM), 
stty(l), su(l), sync(lM), time(l), vcf(lM), volcopy(lM), wall(lM), 
who(l), and write(l); acct(4); all of section 7; and crash(8), 
750ops(8), 780ops(8), and 70boot(8). 


CONFIGURATION GUIDELINES 

Minimum recommended configurations are shown in Figure 2-1. 


HARDWARE CONFIGURATION SIMULTANEOUS USERS 

PDP-11/70; 768K-byte memory; 


RP06 (RP04, RP05) disks 


(two or more drives) 

32 

Above with lM-byte memory and 


a disk drive (or fixed-head disk) 


set aside for the root file system 

40 

VAX-11/780; 2M-byte memory; 


at least three RP06 or RM05 disks 

48 


Figure 2-1. Recommended Configurations 


2-2 




ADMINISTRATIVE ADVICE 


DISK FREE SPACE 

Making files is easy under the UNIX operating system. Therefore, 
users tend to create numerous files using large amounts of file space. 
It has been said that the only standard thing about all UNIX systems 
is the message-of-the-day telling users to clean up their files. 
Administratively, both free disk blocks and free inodes (UNIX system 
talk for file headers) can be a problem. If the free inode count falls 
below 100, the system spends most of its time rebuilding the free 
inode array. If a file system runs out of space, the system prints 
“no-space” messages and does little else. To avoid problems, the 
following start-of-day free counts should be maintained: 


• The file system containing /tmp (temporary files): 


- 16-user system: 1500 free kilobytes (KB). 

- 40-user system: 3000 free KB. 


• The file system containing /usr. 


- 3000 to 6000 free KB depending on load. 


• Other user file systems: 

- 6 to 10 percent free depending on user habits 
(3000 KB minimum). 

This brings up an associated problem: how big should file systems 
be? The preference is to set aside space on each drive for a copy of 
root/swap and use the rest of the pack for a single file system. 
However, if you have user groups that fight over disk space, it may 
be better to split them up arbitrarily (i.e., divide a pack into more 
than one file system). 


Warning: If different disk drives are set up with 
differing cylinder partitions between file systems, it 
will eventually lead to an operational blunder. 


2-3 



ADMINISTRATIVE ADVICE 


A FEW WORDS ABOUT SYSTEM TUNING 

A file system reorganization can help throughput but at the expense 
of down time. If the reorganization is done during nonprime time, it 
can help. For details refer to the Tuning and Configuration Guide. 

If normal shutdown and filesave procedures are used, the file system 
check program [fsck(lM), — S option] will help keep the disk free list 
in reasonable order. Try to keep disk drive usage balanced. If there 
are over 20 users, the root file system (/bin, /tmp, and /etc) deserves 
a drive of its own. If there is a noisy modem (poorly executed do-it- 
yourself null-modem) or a disconnected modem cable, the UNIX 
system will spend a lot of CPU time trying to get it logged in. A 
random check of systems uncovers a lot of this going on. 


WHY A SPARE DISK DRIVE IS NEEDED 

Without a spare disk drive, the system will be down when a drive is 
down. Also, without the spare drive, it is difficult to reorganize file 
systems or to save and restore user files. 


DISK PACKS 

Only fully ECC (Error Correcting Code) correctable disk packs should 
be bought. The pack should be tested; and if uncorrectable errors 
develop, recondition the pack or get rid of it. 


RP06 disk packs used with the UNIX system need not be totally error 
free but must be “flag-free”. The term flag-free means that there 
should be no unrecoverable ECC errors. Technically, proper ECC 
handling can recover from 11-bit error bursts. However, the length 
of bursts can grow as a pack ages. It is recommended that no pack 
that has more than 8-bit error bursts be accepted. 

For the PDP-11/70, in reading the formatter printout, ECC 
correctable errors are identified by the headings “DATA ERROR 
DURING WRITE CHECK”. Error-register values are printed below 
the message. The two registers of interest are RPER1 and RPEC2. 
A RPER1 value of 1000000 indicates ECC (no other bits on). The 


2-4 



ADMINISTRATIVE ADVICE 


RPEC2 register describes the bit span of the error. For example, 
RPEC2=003774 means that there was an unacceptable 9-bit (binary 
0000011111111100) error burst; RPEC2=002400 is an acceptable 3-bit 
span (0000010100000000— there may be zero bits mixed in). If such 
acceptable errors account for all “unrecoverable” errors reported (and 
there are not too many of them), then you have a flag-free pack. 


For the VAX, even this scant information was not available, so a 
formatter has been written (it tells its tale in English); see 

format(lM). 


PROTECTING USER FILES 

Users, especially inexperienced ones, occasionally remove their own 
files. Open files are sometimes lost when the system crashes. Once 
in a great while, an entire file system will be destroyed (picture a 
disk controller that goes bad and writes when it should read). Here 
is a suggested file backup procedure: 

• Each day copy all user file systems to backup packs. Keep 
these packs 3 to 5 days before reusing them. 

• Once a week copy each file system to tape. Keep weekly tapes 
for 8 weeks. 

• Keep bimonthly tapes “forever” (they should be recopied once 
a year). 


The most recent weekly tapes should be kept off premises. The other 
tapes should be in a fireproof safe if available and not too expensive. 

When the UNIX system goes down, active files can get scrambled. 
Your users will not want to start the day over every time the system 
fails. In addition to good backup, you must have file system patching 
expertise available (on-site or on-call). If the system is ever rebooted 
for general use without first checking the file systems, terrible things 
will happen. Study checkall(lM), fsck(lM), and crash(8) as well as 
the “File System Checking” chapter for more information. 


2-5 



ADMINISTRATIVE ADVICE 


FILE SYSTEM BACKUP PROGRAMS 

The following backup programs are distributed: 


• Find/cpio: The UNIX system is distributed in cpio format. 
The —cpio option of the find command can be used for saving 
only those files that have changed or been created over a 
definite period. 

• Volcopy: Physical file system copying to disk or tape. For 
those with a spare drive, volcopy to disk provides convenient 
file restore and quick recovery from disk disasters. Tape 
volcopy provides good long-term backup because the file 
system can be read-in fairly quickly, mounted, and browsed 
over. Disk and tape volcopy are generally used together for 
short- and long-term backup. Note that a volcopy from a 
mounted file system may result in an inconsistent copy (files 
being written at the time can contain invalid data). 


Figure 2-2 summarizes attributes of these programs. In the figure, 
the file system size is 65,500 KB in all cases; times are in minutes; 
judgements are subjective. 


The spare disk drive is strongly recommended. The speed and 
convenience of volcopy are by no means the only advantage of a 
spare drive. It is strongly recommended that the administrator 
modify the /etc/filesave and /etc/checklist files to meet the 
operational needs and update the local operator’s manual accordingly. 
Remember, the more the administrator automates and documents 
operational procedures the less downtime will be encountered. 


CONTROLLING DISK USAGE 

If the UNIX system is a success, disk space will soon become limited. 
During the long delay before more drives become available, usage 
should be controlled. Try to maintain the start-of-day counts 
recommended. Watch usage during the day by executing the df(l) 
command regularly. 


2-6 



ADMINISTRATIVE ADVICE 



FIND/CPIO 

VOLCOPY (DISK) 

VOLCOPY (TAPE) 

Full dump time 

40 

2 

15 

Incremental dump time 

7 

- 

- 

Full restore time 

80 

2 

15 

Incremental restore time 

10 

- 

- 

Ease of restoring: 
one file 

fair 

good 

fair 

a directory 

fair 

good 

good 

scattered files 

poor 

good 

good 

full restore 

fair 

very good 

good 

Needs tape drive 

yes 

no 

yes 

Needs spare file system 
(two CPUs can share) 


yes 

. 

Maintains pack/tape labels 

no 

yes 

- 

Handles multireel tape 

yes 

- 

yes 

512 KB per record 

1.10 

88 

10 

Interactive 

(i.e., ties up console) 

yes 

yes 

yes 

May require separate 

I/D space 

no 

no* 

no 


* KB per record are cut to 22 without separate I/D space. 

Figure 2-2. File System Backup Programs 


The du(l) command should be executed (after hours) regularly (e.g., 
daily), and the output kept in an accessible file for later comparison. 
In this way, users rapidly increasing their disk usage may be spotted. 
This can also be accomplished by running the accounting system’s 
acctdusg program [see acct(lM)] as shown in “The UNIX System 
Accounting” chapter. 


The find(l) command can be used to locate inactive (or large) files. 
For example: 


find / -mtime +90 -atime +90 -print >somefile 


records in somefile the names of files neither written nor accessed in 
the last 90 days. 


The administrator will also have to balance usage between file 
systems. To do this, user directories must be moved. Users should be 


2-7 




ADMINISTRATIVE ADVICE 


taught to accept file system name changes (and to program around 
them— preferably ahead of time). The user’s login directory name 
(available in the shell variable HOME) should be utilized to 
minimize pathname dependencies. User groups with more extensive 
file system structures should set up a shell variable to refer to the 
file system name (e.g., FS ). 


The find(l) and cpio(l) commands can be used to move user 
directories and to manipulate the file system tree. The following 
sequence is useful (it moves the directory trees userx and usery from 
file system filesysl to file system filesys2 where, presumably, more 
space is available): 


cd /filesysl 

find userx usery -print I cpio -pdm /filesys2 

# Make sure new copy is OK. 

# Change userx and usery login directories 

# in the /etc/passwd file. 

# Notify userx and usery via mail(l) that 

§ they have been moved and that pathname 

# dependencies in their .profile and shell 

# procedures may need changed. See the 

# discussion on $HOME above. 

rm -rf /filesysl/userx /filesysl/usery 


When moving more than one user in this way, keep users with 
common interests in the same file system (these users may have 
linked files) and move groups of users who may have linked files with 
a single cpio command (otherwise linked files will be unlinked and 
duplicated). 


REORGANIZING FILE SYSTEMS 

There is a new file system reorganization utility called dcopy(lM). 
On an otherwise idle system, a reorganized file system has almost 
twice the I/O throughput of a randomly organized file system. This 
applies to file copying, finds, fscks, etc. Dcopy can take up to 2.5 
hours to initially reorganize (copy) a large file system. During 
reorganization, the system can be up, but the file system being copied 
must be unmounted. 


2-8 



ADMINISTRATIVE ADVICE 


For those who can afford the operator time, root reorganization once 
a week (requires system reboot) and user file system reorganization 
once a month will improve system performance. Dcopy is an 
interim step. 


KEEPING DIRECTORY FILES SMALL 

Directories larger than 5K bytes (320 entries) are very inefficient 
because of file system indirection. A UNIX system user once 
complained that it took the system 10 minutes to complete the login 
process; it turned out that his login directory was 25K bytes long, and 
the login program spent that time fruitlessly looking for a 
nonexistent .profile file. A large /usr/mail or /usr/spool/uucp 
directory can also really slow the system down. The following will 
ferret out such directories: 


find / -type d -size +10 -print 


Removing files from directories does not make the directories get 
smaller (the empty directory entries are available for reuse). The 
following will “compact” /usr/mail (or any other directory): 


mv /usr/mail /usr/omail 
mkdir /usr/mail 
chmod 777 /usr/mail 
cd /usr/omail 

find . -print ! cpio -plm ../mail 
cd .. 

rm -rf omail 


ADMINISTRATIVE USE OF “CRON” 

The program cron(lM) is useful in the administration of the system; 
it can be used to: 

• Turn off the programs in directory /usr/games during prime 
time. 


2-9 



ADMINISTRATIVE ADVICE 


• Run programs off-hours: 

- accounting; 

- file system administration; 

- long-running, user-written shell procedures. 


WATCH OUT FOR FILES AND DIRECTORIES 
THAT GROW 

Most of the below files are restarted automatically by entries in 
/etc/rc at system reboot. 


• Accounting files: 


• /etc/wtmp— login information; grows extremely fast 
with terminal line difficulties; use acctcon(lM) to 
determine the offending line(s). 

• /usr/adm/pacct— per process accounting records; gets 
big quickly; monitored automatically by ckpacct from 
cron(lM). 

• / usr/lib/cron/log— status log of commands executed by 

cron(lM); also watch this file for error messages from 
the programs being executed in 

/usr/spool/ cron/cron ta b/*. 

• /usr/adm/ err file— hardware error logging info; also 
read login adm’s mail periodically. 

• /usr/adm/ctlog— a log of the people who use ct(lC) 
command. 

• /usr/adm/sulog— a log of those who execute the 
superuser command. 

• /usr/adm/Spacct— process accounting files left over 
from an accounting failure; remove these files unless the 
accounting files that failed are to be rerun. 


2-10 



ADMINISTRATIVE ADVICE 


• Other files: 

• /usr/spool— spooling directory for line printers, 
uucp(lC), etc., and whose subdirectories should be 
compacted as described above. 


ALLOCATING RESOURCES TO USERS 

A prospective user should first obtain authorization to use the system 
and then apply for a login by providing the following information to 
the “system administrator”: 


• User’s name. 

• Suggested login name (not more than eight characters, 
beginning with a lowercase letter and not containing special or 
uppercase letters). 

• Relationships to other users (this influences the choice of the 
file system). 

• Estimate of required file space (this also influences the choice 
of the file system) and connect hours. This aids in hardware 
growth planning. 


Users must have passwords with at least six characters. (Only the 
first eight characters are significant.) Also, every password must 
have at least two alphabetic characters and one numeric or special 
character. The password must differ from the user’s login name and 
any reverse or circular shift of it. Refer to passwd(l) and 
passwd(4) for more information on password selection and password 
aging. 


2-11 



ADMINISTRATIVE ADVICE 


THE MATTER OF ACCOUNTING AND USAGE 

You should run the accounting programs even if there is not a “bill” 
for service. Otherwise, users’ habits (especially bad habits) will be a 
mystery to you. Accounting information can also help you find 
performance bottlenecks, unused logins, bad phone lines, etc. 


DIAL-LINE UTILIZATION 

If prime-time dial-line utilization gets much over 70 percent, users 
will start to encounter busy signals when dialing in. This, in turn, 
will lead to “line hogging”. The only solutions are to acquire more 
dial-up ports, get a larger (another) machine, or to get rid of users. 
Manual policing will help some, but “automatic” policing will be 
invariably subverted by users. 


“BIRD-DOGGING” 

When the system is busy (lines busy and/or slow response), someone 
should determine why this is so. The who(l) command lists the 
people logged in. The ps(l) command shows what they are doing. 
Unfortunately, ps operates from heuristics that can consistently fail 
to report certain processes in a busy system. That is, one must be 
careful about hanging up an apparently inactive line. The 
acctcom(lM) command can read the process accounting file 
/usr/adm/pacct backwards from the most recent entry. It will print 
entries for selected lines or login names. 


TERMINALS 

Do not use uppercase only terminals. Use full-duplex, full-ASCII 
asynchronous terminals. Hardware horizontal tabbing is very 
desirable because it increases output speed and lowers system 
overhead. A fair proportion of the terminals should provide for 
correspondence-quality hard copy output to take advantage of the 
UNIX system word processing capabilities; see term(5). 


2-12 



ADMINISTRATIVE ADVICE 


LINE PRINTERS 

Most line printers are troublesome and impose considerable overhead 
on the system. Most also lack hardware tabs, character overstrike 
capability, etc. A printer that will work over an asynchronous link 
(DC1/DC3 protocol required) is the best bet. 


SECURITY 

The current UNIX operating system is not tamperproof. The system 
administrator cannot keep people from “breaking the system but 
can usually detect that they have done so. The following command 
will mail (to root) a list of all “set user ID” programs owned by root 
(superuser): 


find / -user root -perm -4100 -exec Is -1 {} \; ! mail root 


Any surprises in root’s mail should be investigated. In dealing with 
security, 


• Change the superuser password regularly. Do not pick obvious 
passwords (choose 6-to-8 character nonsense strings that 
combine alphabetics with digits or special characters). 

. Dial ports that do not require passwords usually cause trouble. 

• The chroot(lM) and su(l) commands are inherently 
dangerous as are group passwords. 

• Login directories, .profile files, and files in /bin , /usr/bin, 
/lbin, and /etc that are writable by others than their 
respective owners are security weak spots; police the system 
regularly against them. 

• Remember, no time-sharing system with dial ports is really 
secure. Do not keep top secret information on the 
system. 


2-13 



ADMINISTRATIVE ADVICE 


COMMUNICATING WITH THE USERS 

The directory /usr/news and the news(l) command are provided as 
a way to get “brief” announcements to your users. More pressing 
items (one-liners) can be entered in the /etc/motd (message of the 
day) file; motd and (new to the user) news are announced at login 
time. 

To reach users who are already logged in, use the wall(lM) (write 
all) command. Do not use wall while logged-in as superuser, except 
in emergencies. 


The /usr/news directory should be cleaned out once a week by 
removing everything older than 2 months. It has been found that on 
most systems a file in /usr/news will reach 50 percent of the users 
within a day and over 80 percent within a week; motd should be 
cleaned out daily. 


TROUBLESHOOTING 

It would be easy to write a book on troubleshooting. The following is 
some effective advice in dealing with troubles. In dealing with 
hardware support service personnel, 

• Be sure that the contractor agrees to get along with the UNIX 
software before you take out a hardware service contract with 
DEC or with someone else (“It’s the hardware,” says you; “It’s 
the software,” says the hardware service contractor). 

• Keep on top of problems. For instance, DEC has a problem- 
aging priority scheme. Find out about any such scheme that 
your contractor may have and make them prove that it is 
being followed. Remember that an unreported problem is 
getting no priority at all. If a problem persists, escalate it 
through the contractor’s local management chain; it may also 
be effective to complain to the contractor’s sales 
representative. 

• For effective service, an extended period support service 
offering (e.g., 16 hours/day, 6 days/week) should be provided. 
Arrange for preventive maintenance, noncritical repair, and 
add-on installation work to be done before or after prime time. 


2-14 



ADMINISTRATIVE ADVICE 


• Know the details of the support service offering applicable to 
the installation. In particular, make sure that preventive 
maintenance is scheduled in advance and that it is completed. 

• A “site log” should be maintained for the hardware. All 
troubles should be recorded in the log by the support service 
personnel and/or the operating personnel. 

• Make sure that the hardware vendor (as well as the hardware 
service contractor, if the two are different) agrees to the 
presence of non-DEC equipment on your system (even if you 
have none to start with). 

• Run error logging and maintain console sheets. Make sure 
error messages are shown to support service personnel. 

• Take core dumps after system crashes and have them available 
for support service personnel. 

• Keep records of downtime and make sure that support service 
personnel know about them. 

• After having considerable configuration trouble on the VAXs, 
a stand-alone utility vcf was developed which tests for the 
presence of both MASSBUS* and UNIBUS* devices. The 
resulting list of addresses is compared with the operating 
system object module (/unix). Device address and interrupt 
vector discrepancies are reported. A complete configuration 
list can be generated as well. The vcf should be run after any 
DEC service or any operating system software change. It can 
save days of downtime. 


* Trademark of Digital Equipment Corp. 


2-15 



ADMINISTRATIVE ADVICE 


Telephone problems are most apt to occur when rearranging or 
adding equipment. Occasionally, central office, trunking, or modem 
failures occur. In dealing with the telephone services vendor, 

• Be specific with repair operators. Tell the operators that the 
trouble involves data equipment. 

• If the first call fails to get results, ask for the “supervisor” on 
the second call, and if necessary, escalate further to get the 
problem solved. 


Some of the obvious problem areas are: 


• Disk Drives— Over 50 percent of the problems are likely to be 
related to the disk subsystem. As mentioned earlier, the way 
to keep the system up is to have a spare disk drive. Remember 
that preventive maintenance of disk drives is very important. 
Make sure that the support service personnel who service the 
hardware see the error-logging printouts and console error 
messages produced by the UNIX system (and that they 
understand them). Disk failure can ruin a file system. The 
only defense is to make a complete, daily file backup! (See the 
part “Protecting User Files”.) 

• Dial Ports— In the dial-in interface area, there is room for 
finger-pointing among all involved vendors. Check for obvious 
things such as is the system in “multiuser” mode, is the 
/etc/inittab file OK, or are any cables loose ( both ends)? In 
some telephone offices, trunk hunting is based on 10-number 
groups. Hunting between such groups can fail independently 
of anything else. The possibilities for trouble are many. 
Figure 2-3 attempts to describe some alternatives; it is meant 
primarily for users of the DH11/DZ11 asynchronous devices. 
As an example of the format, (vertical) Rule 3 reads: “If line 
rings and ring light shows and computer does not answer and 
switching the modem solves the problem, then it is likely to be 
a telephone company problem; also, busy out that line.” 


2-16 



ADMINISTRATIVE ADVICE 


• Synchronous Ports— High-speed synchronous interface devices 
are even more trouble than dial equipment. The following is a 
list of potential trouble spots: 


—The UNIX system software. 
—Interface device (e.g., KMC11B). 
—Cable to the modem. 

—The modem. 

—The communications line. 
—Other modem. 

—Other cable. 

—Other interface device. 

—Other system’s software. 


Rules: 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0 

Condition: 

Line rings 

N 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Ring light shows on telephone console 

- 

N 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Computer answers 

- 

- 

N 

N 

Y 

Y 

Y 

Y 

Y 

Y 

Login message received on terminal 

- 

- 

- 

- 

N 

N 

Y 

Y 

Y 

Y 

Switching modem solves problem 

- 

- 

Y 

N 

Y 

N 

- 

- 

- 

“ 

User can login 

- 

- 

- 

- 

- 

- 

N 

N 

N 

Y 

Telephone console shows data received 

- 

- 

- 

- 

- 

- 

Y 

Y 

N 

- 

Problem affects whole DH/DZ 

- 

- 

- 

- 


- 

Y 

N 

- 

- 

Diagnosis and/or Action: 

No problem 

- 

- 

- 

- 

- 

- 

X 

- 

- 

X 

Processor hardware problem likely 

- 

- 

- 

X 

- 

X 

- 

- 

- 

Telephone problem likely 

X 

X 

X 

- 

X 

- 

- 

- 

X 

- 

May be a problem with user’s terminal 

- 

- 

- 

- 

- 

- 

X 

X 


- 

Busy out bad line(s) 

X 

X 

X 

X 

X 

X 

- 

X 

- 


Figure 2-3. Asynchronous Line Problems 


• Power Supply Modules— There are a lot of them, and they do 
fail, more or less regularly. Hard failure can be detected at 
the console; voltage drift is tougher. Failure of the FP11 
(floating-point unit) power supply (on PDP-11/70) can be slow 
to fix because service personnel are likely to work back from 
the far end of the “bus” taking a long time to find the 
problem. 


2-17 



ADMINISTRATIVE ADVICE 


DATA SET OPTIONS 

The following data set options seem to work with the UNIX system: 


The 801C-L1 (Auto-Call Unit): 
Jumpers: 

E2 to E3 
E6 to E5 


Options: 

Y, X, T, B, 
ZG, ZP, G, 
R, ZT 


Switches (0 = open, 1 = closed, i.e., side next to number is down): 

51 = 1000[1] (Bracketed switches are missing on some models.) 

52 = 0101 

53 = 11010 

54 = 11[00] 

The 212A-L1 (1200-baud full duplex): 

Options: 

E, ZF, YF, YC, 

YG, YJ, YK, 

S, V, A, T, ZH, 

W, YP, YR 


Switches: 

51 = [0]001 

52 = 110001000 

53 = 11110000 (10100000 on 212AR-L1) 
S5 = 00 


2-18 



ADMINISTRATIVE ADVICE 


NULL MODEM WIRING 

Improperly wired null modems can cause spurious interrupts, 
especially at higher baud rates. A single bad modem on a 9600-baud 
line can waste 15 percent of your CPU power. The following 
(symmetrical) wiring plan will prevent such problems: 

pin 1 to 1 
pin 2 to 3 
pin 3 to 2 

strap pin 4 to 5 in the same plug 

pin 6 to 20 

pin 7 to 7 

pin 8 to 20 

pin 20 to 6 and 8 

ground unused pins 


113D, 103 J DATA SET PROBLEMS 

The DH11 and DJ11 multiplexers normally have a jumper connecting 
pin 25 to pin 4 (request to send), thus asserting pin 25 when the line 
is opened. This jumper should be removed for any lines connected to 
113Ds or 103Js (also applies to 103Js with 801s). 


2-19 



ADMINISTRATIVE ADVICE 


NOTES 


2-20 



Chapter 3 


SETTING UP THE UNIX SYSTEM 
(DEC PROCESSORS) 

GENERAL 

This chapter describes the load, update, and configuration procedures 
involved in installing a UNIX operating system on the following DEC 
processors: 


• Digital Equipment Corp. VAX-11/780, -11/750 

• Digital Equipment Corp. PDP-11/70. 


Prerequisites 

Before attempting to generate a UNIX operating system, the system 
administrator should understand that a considerable knowledge of 
the related documentation is required and assumed. In particular, 
the administrator should be very familiar with the following 
documents: 


UNIX System 
UNIX System 
UNIX System 
UNIX System 
UNIX System 


V User Guide 

V User Reference Manual 

V Administrator Reference Manual 

V Programmer Reference Manual 

V Operator Guide. 


Throughout this chapter, each reference of the form name(lM), 
name(7), or name(8) refers to entries in the UNIX System V 
Administrator Reference Manual. References to entries of the form 
name(N), where " N" is the number 1 or 6 possibly followed by a 
letter, refer to entry name in section N of the UNIX System V User 
Reference Manual. If " N" is a number 2 through 5 possibly followed 
by a letter, refer to entry name in section N of the UNIX System V 
Programmer Reference Manual. 


3-1 



SETUP 


The system administrator must have a basic understanding of the 
operation of the hardware. This includes the operation of the console 
and the tape and disk drives, which are assumed to have standard 
UNIX system addresses and interrupt vectors. It is also assumed 
that the hardware works and has been completely installed. All 
appropriate DEC diagnostics should have been run to test the 
configuration, and a detailed description of the hardware including 
device addresses, interrupt vectors, and bus levels is needed. This 
information is necessary to generate a UNIX system. 


Procedure 

The UNIX system is distributed on two magnetic tapes, recorded in 
9-track format at either 800 bits-per-inch (bpi) or 1600 bpi. 
Distribution tapes will be marked either “PDP” or “VAX”, 800 bpi or 
1600 bpi. Make sure you have the correct tape for your machine. 


Initial Load 

The initial load program (on Tape 1) will copy a file system from tape 
(VAX-11/780: either a TE16, TU77, or a TU78; VAX-11/750: either a 
TS11 or a TU77; PDP-11/70: either a TU10 or a TU16) to disk (VAX: 
either an RP06, an RP07, an RM05, or an RM80; PDP-11/70: either an 
RP03, an RP06, or an RM05). In this document, RP04/5 drives are 
considered to be equivalent to RP06 drives; any differences will be 
noted explicitly. Once the root file system has been successfully 
loaded to disk, the UNIX system may be booted and the available 
utility programs used to complete the installation. 


Update 

The update procedures are to be used only on releases that are 
designated as update releases. The cpio(l) program is used to 
perform all updates. The cpio program will not update any file if its 
replacement has a modification time that is less than (i.e., earlier 
than) the modification time of the original file. Certain 
administrative files (e.g., / etc/passwd) are sent with a modification 
time of January 1, 1970 to ensure that they do not replace their 
counterparts during updates. Any file not copied will cause cpio to 


3-2 



SETUP 


print a message to that effect. These messages should always be 
investigated to ensure that any files not copied were of that type. 
However, note that, depending on respective modification times, a 
locally-modified file may get updated, thus destroying the local 
modifications. 

One of the most common problems that can arise during an 
installation is running out of disk space when performing an update. 
Should this occur, the original contents of the file system should be 
restored from a backup copy and the contents of the update tape 
should be read into a spare file system using the cpio program. 
Unwanted material can then be removed and the original file system 
can be updated from this new file system using the — pdm options of 
cpio. 


New Tape and Disk Device Names 

Tape and disk name formats have been inconsistent between different 
vendor machines and computer installations. In this release of the 
UNIX system, a standardization has been established. The new 
formats place disk and tape devices in subdirectories under /dev. To 
make the transition easier, the old names can still exist. However, all 
documentation and sample shell scripts distributed with this release 
will use the new convention. System administrators are encouraged to 
make the new /dev nodes and use them in their shell scripts as soon 
as possible. 


Three notations are used in the format descriptions. The { } are used 
to indicate an optional field. The [] are used to indicate a nonoptional 
field where the user has a few alternatives. The () indicate a field 
that the system administrator may choose to include in the naming 
conventions. 

The new standard for tapes is 

/dev/ { r } mt/ (c#d)#[hml] { n} 


3-3 



SETUP 


where 

r indicates a raw device. Blocked is the default. 

mt indicates a magnetic tape device. 

c#d indicates the controller number. The c and d are 

included to avoid ambiguity. This field is optionally 
specified by system administrators if they wish to 
specify controller numbers. 

# is the device number. 

hml indicates the density. The h (high) specifies 6250 bpi, m 

(medium) specifies 1600 bpi, and 1 (low) is 800 bpi. 
Higher density drives that are developed in the future 
will be represented by v (very high) and u (ultra). 

n indicates no rewind on close. The default is rewind. 

The new disk format is 

/dev/{r}dsk/ (r)(c#d)#s# 
where 

r indicates a raw interface to the disk. The default is the 

normal system buffering. 

dsk indicates a disk device. 

r indicates that this disk is on a remote system. 

c#d the / indicates the controller number. It is up to system 

administrators to decide whether to specify this field in 
their disk names. 

#s# the first / stands for the drive number. The second / 

stands for the section number. Both fields are free 
format. There is no default drive or section 
number. 


3-4 



SETUP 


Both old and new names can exist on the system. All that the system 
administrators would have to do is link the old names to the new 
ones. 


LOAD PROCEDURES 


Distribution Tape Format 

Tape 1 contains eight files: a loader, a physical copy of the root file 
system, the cpio program, a cpio structured copy of the root file 
system, and four files (cpio format) that represent selectable items. 
Root refers to the directory “/”, which is the root of all the directory 
trees. The format of this tape is as follows: 


file 1 Tape boot loader— 512 bytes; 

Tape boot loader— 512 bytes; 

Initial load program— several 512-byte records 

file 2 root file system (physical)— 5120-byte records 

(blocking factor 10) 

file 3 cpio program (latest version)— several 512-byte 

records (to be used only for updating an earlier 
UNIX system release) 

file 4 the root file system (structured in cpio 

format)— several 5120-byte records (to be used 
only for updating an earlier UNIX system 
release) 

file 5 on-line manual pages (same format as file 4) 

file 6 on-line documents (same format as file 4) 

file 7 Remote Job Entry (RJE) software (same format 

as file 4) 

file 8 Graphics software (same format as file 4). 


3-5 



SETUP 


The root (/) file system contains the following directories: 


bck 

bin 

dev 

etc 

lib 

lost+found 

mnt 

stand 

tmp 

usr 


Directory used to mount a backup file system 
for file restoring 

Public commands; described in Section 1 of the 
UNIX System V User Reference Manual 

Special files, all the devices on the system 

Administrative programs and tables 

Public libraries, parts of the assembler, C 
compiler 

Directory used by fsck(lM) for disconnected 
files 

Directory used to mount a file system 
Stand-alone programs 

Directory used for temporary files; should be 
cleaned at reboot 

Directory used to mount the /usr file system. 


Initial Load of Root 

Mount Tape 1 on drive 0 and position it at the load point. 


PDP- 11/70 

Boot the tape by reading either record 0 or 1 into memory starting at 
address 0 and starting execution at address 0. This may be 
accomplished by using a standard DEC ROM bootstrap loader, a 
special ROM, or some manual procedure; see romboot(8), 
tapeboot(8), and 70boot(8). 


3-6 



SETUP 


VAX 

See “Installation Boot Procedures” under either 11/780 ops(8) or 
11/750 ops(8). These entries describe initial tape booting and 
modification of the console floppy disk to simplify UNIX system 
administration. 


Common to PDP-11/70 and VAX 

The tape boot loader will type “UNIX tape boot loader” on the 
console terminal and read in and execute the initial load program. 
The program will then type detailed instructions about the operation 
of the program on the console terminal. The program will ask what 
type of disk drive you have and which drive you plan to use for the 
copy. The disk controller used must be at the standard DEC address 
indicated by the program; however, other disk controllers on your 
system may be at nonstandard addresses. A formatted, error 
correcting code (ECC) flag-free pack must be mounted on the drive 
you have indicated. If necessary, use the appropriate DEC diagnostic 
program to format the pack. For the VAX, use format(lM). Note 
that the pack will be written on. Next, the program will ask what 
type of tape drive you have and which drive contains the tape. 
Normally, this will be drive 0, but the program will work with other 
drives. Note that the tape is currently positioned correctly after the 
end-of-file between the initial load program and the root file system. 
When everything is ready, the program will copy the file system from 
the tape to disk and give instructions for booting the UNIX system. 
After the copy is complete and you have booted the basic version of 
the UNIX operating system, check [using fsck(lM)] the root file 
system and browse through it. 


PDP-11/70 Only 

The file /stand/mmtest is a stand-alone memory mapping diagnostic 
program for the PDP-11/70. If you are not absolutely sure that DEC 
FCO (field change order) M8140-R002 has been applied to your PDP- 
11/70 CPU, stand/mmtest should be booted and allowed to run at 
least 20 minutes. To boot this program, go through the disk boot 
procedure, but specify 


0=stand/mmtest 


3-7 



SETUP 


The UNIX system comes with optional power-fail 
feature requires that the power-up and power-down i 

Ko 09/1 


recovery. This 


Update of Root 

It is very important that the system be running in single-user mode 
during the update phase. To update an already existing root file 
system, files three and four on Tape 1 will be used. It is necessary to 
first make a copy of the root file system using volcopy(lM) and then 
update this copy. The copy should be made on a separate disk pack 
using the same section number as the root file system (always section 
0). Also, after the update is completed, check if any of the local 
administrative files in the directory /etc need modification. Most of 
these are mentioned in the part “Administrative Files”. 


Mount Tape 1 on drive 0 and position it at the load point. It is 
assumed that disk drive 1 is available for making the copy and that 
the root file system is on /dev/dsk/OsO. The following procedure will 
first make a copy of the root file system, and then update this copy. 
Note that /dev/mt/Omn refers to tape drive 0 but has the side effect 
of spacing forward to the next end-of-file (no rewind option). The — B 
option of cpio specifies that input is in 5120-byte records. 


volcopy root /dev/rdsk/OsO pknamel /dev/rdsk/lsO pkname2 

mount /dev/dsk/lsO /bck 

§ The 2 echoes will move the tape to file 3 

echo </dev/mt/Omn 

echo </dev/mt/Omn 

cp /dev/mt/Omn /bck/bin/cpio 

chmod 755 /bck/bin/cpio 

chown bin /bck/bin/cpio 

cd /bck 

/bck/bin/cpio -idmB </dev/rmt/Om 
cd / 

umount /dev/dsk/lsO 


Pknamel and pkname2 are the volume names of the source and 
destination disk packs, respectively. If the new copy is satisfactory, 
shut down and halt the system, change disk packs, and reboot the 
system using the new root. 


3-8 



SETUP 


Tape 2 (/usr) Format 

Tape 2 contains the /usr file system in cpio format (5120-byte 
records). The /usr file system contains commands and files that 
must be available (mounted) when the system is in multiuser mode. 
The tape contains the following directories: 


adm 

Miscellaneous administrative command and 
data files including the process accounting file 
pacct. 

bin 

Public commands; an overflow for /bin . 

catman 

Contains packed formatted manual pages used 
by man(l). 

games 

Various demonstration and instructional 
programs. 

include 

Public C language #. include files. 

lib 

Archive libraries including the text processing 
macros; also contains data files for various 
programs such as spell(l). 

lost+found 

Directory used by fsck(lM) for disconnected 
files. 

mail 

Mail directory. 

news 

Place for all the various system news; see 
news(l). 

pub 

Handy public information, e.g., table of ASCII 
characters. 

spool 

Spool directory for daemons. 

src 

Source for commands, libraries, the system, etc. 

tmp 

Directory for temporary files; should be cleaned 
at reboot. 


3-9 



SETUP 


Initial Load of /usr 

Mount Tape 2 on drive 0 and position it at the load point. Mount a 
file system (device) as /usr. The ultimate size and location of this 
file system on a device is an administrative decision; initially, the 
following procedure will suffice: 


mkfs /dev/rdsk/Osl 65000 gap blocks 

# See mkfs(lM) for appropriate parameters. 

# Note that for RP03 disks, this will use part of section 2. 
labelit /dev/rdsk/Osl usr pkname 

mount /dev/dsk/Osl /usr 
chmod 775 /usr 
cd /usr 

cpio -idmB </dev/rmt/0m 


Pkname is the volume name of the pack (e.g., “pOOOl”). 


Because /usr must be mounted when the system is in multiuser mode, 
the file /etc/rc must be changed to include the command lines to 
mount and unmount the file systems in single user and multiuser 
mode. These lines must be inserted at the appropriate places in 
/etc/rc as indicated by comments in the prototype file. Next, the 
files /etc/checklist and /etc/checkall should be changed to include the 
file system device (e.g., /dev/rdsk/Osl)-, see checkall(lM), fsck(lM), 
labelit(lM), mkfs(lM), mount(lM), and checklist(4). 


Update of /usr 

It is advisable that the system be running in single user mode during 
the update phase. It is also wise to first make a copy of your /usr 
file system for backup purposes. Next, mount Tape 2 on drive 0 and 
position it at the load point. The /usr file system must also be 
mounted. The following procedure will perform the update: 


cd /usr 

cpio -idmB </dev/rmt/0m 


3-10 



SETUP 


Initial Load or Update of Selectable Items 

The initial load and update procedures are essentially the same; the 
only exception being the creation of the selectable item directory on 
the initial load. 

Mount Tape 1 on drive 0 and position it at the load point. Make sure 
that the /usr file system is mounted. The following procedure will 
read* in the source for each of the selectable subsystems. 

Note: If a particular subsystem is not desired, simply skip 
that file on the tape by executing the following command: 
echo </dev/rmt/Omn 

The tape can be rewound after any subsystem by specifying 
/dev/rmt/Om instead of /dev/rmt/Omn. 

Enter the following commands: 

echo </dev/ rmt/Omn; echo </dev/rmt/Omn 
echo </dev/rmt/Omn; echo </dev/rmt/Omn 

Next, install the unformatted manual pages by 

cd /usr 

mkdir man; chown bin man; chgrp bin man; chmod 775 man 
cd man 

cpio -idmB </dev/rmt/Omn 
Next, install the on-line machine readable documents by 
cd /usr 

mkdir docs; chown bin docs; chgrp bin docs; chmod 775 docs 
cd docs 

cpio -idmB </dev/rmt/Omn 
Next, install the RJE source by 
cd /usr/src/cmd 

mkdir rje; chown bin rje; chgrp bin rje; chmod 775 rje 
cd rje 

cpio -idmB </dev/rmt/Omn 

Finally, install the graphics source by 
cd /usr/src/cmd 

mkdir graf; chown bin graf; chgrp bin graf; chmod 775 graf 
cd graf 

cpio -idmB </dev/rmt/Omn 


3-11 



SETUP 


After installing the source for the rje subsystems, the software must 
be built and installed. 


To build and install rje, change your working directory to /usr/src 
and execute one of the following :mkcmd lines: 


ARGS=" rjel" 
ARGS=" rje2" 
ARGS=" rje3" 
ARGS— ' rje4" 


./:mkcmd rje 
./:mkcmd rje 
,/:mkcmd rje 
./rmkcmd rje 


# makes a single IBM system 

# makes rjel, and rje2 

§ makes rjel, rje2, and rje3 

# makes rjel, rje2, rje3, and rje4 


See rje(8) and the “UNIX System Remote Job Entry ” chapter for 
additional information. 


To build and install the graphics package, 


cd /usr/src 
./:mkcmd graf 


CONFIGURATION PLANNING 


UNIX System Configuration 

The basic UNIX operating system supplied supports only the console, 
a disk controller (disk drive 0), and a tape controller (tape drive 0). 
Each system administrator must describe the actual configuration of 
their own system. 


All of the UNIX operating system source code and object libraries are 
in /usr/src/uts. All of the configuration information is kept in the 
directory / usr/src/uts/*/cf ; the represents either pdpll or vax. 
There are only two files that must be changed to reflect your system 
configuration— low.s ( univec.c on the VAX) and conf.c. The program 
config(lM) should be used to make these changes. 


Config requires a “system description file” and produces the two 
needed files. Figure 3-1 lists the values and sizes of the basic 
parameters for the different CPUs. For more details of syntax and 
structure, see config(lM) and the associated master(4). 


3-12 



SETUP 


Item 

PDP-11/70 

VAX- 1 1/750, /780 

Range 

Size 

Range 

Size 

nswap 

3000 

— 

9000 

- 

buffers 

25-60 

30* 

80-400 

56f 

sabufs 

10-15 

542 

— 

— 

hashbuf 

32-128 

6 

32-128 

12 

physbuf 

3-5 

30 

3-7 

56 

inodes 

100-250 

24 

100-300 

84 

iblocks 

80-200 

52 

— 

— 

files 

100-250 

8 

100-300 

12 

mounts 

8-16 

8 

8-20 

20 

coremap 

50-100 

4 

— 

— 

swapmap 

50-100 

4 

50-100 

4 

calls 

30-60 

6 

30-60 

12 

procs 

50-200 

32 

50-200 

64 

texts 

25-50 

12 

25-50 

16 

clists 

100-300 

28 

100-250 

72 

maxproc 

25 

— 

25 

— 


* Plus 512 bytes/buffer outside system space, 
f Plus 1024 bytes/buffer allocated at start up. 


Figure 3-1. Parameter Values 


The first part of the system description file lists all of the hardware 
devices on the system. Next, various system information is listed. A 
brief explanation of this information follows: 

• root — Specifies the device where the root file system is to be 
found. The device must be a block device with read/write 
capability because this device will be mounted read/write as 
“/”. Thus, a tape cannot be mounted as the root but can be 
mounted as some read-only file system. Normally, root is disk 
drive 0, section 0. 

• pipe— Specifies where pipes are to be allocated (must be a 
mounted file system— the root file system is normally used). 

• dump— Specifies the device to be used to dump memory after a 
system crash. Currently, only the TU10, TU45/TE16, TU78, 
and TS11 tape drives are supported for this purpose. 


3-13 




SETUP 


© swap— Specifies the device and blocks that will be used for 
swapping. Swplo is the first block number used and nswap 
indicates how many blocks, starting at swplo, to use. Care 
must be taken that the swap area specified does not overlap 
any file system. For example, if section 0 is 8000 blocks long, 
the root file system could occupy the first 6000 blocks, and 
swap the remaining 2000 by specifying: 


root rp06 0 

swap rp06 0 6000 2000 


• buffers— Specifies how many “system buffers” to allocate. 
Real-time response improves as more buffers are allocated. 
UNIX system buffers form a “data cache”. Improvement in 
the hit rate of this cache tends to fall as the number of buffers 
is increased. 

® sabufs— PDP-11/70 only: specifies how many “system 
addressable” buffers to allocate. One buffer is needed for 
every mounted file system. Certain I/O drivers need such 
buffers. 

• hashbuf— Specifies how many hash buckets to allocate. These 
are used to search for a buffer given a device number and 
block number. This number must be a power of two. The 
default value is 64. 

• physbuf— Specifies how many physical I/O buffer headers to 
allocate. One is needed for each physical read or write active. 
The default value is 4. 

• inodes— Specifies how many “inode table” entries to allocate. 
Each entry represents a unique open inode. When the table 
overflows, the warning message “Inode table overflow” will be 
printed on the console. The table size should be increased if 
this happens regularly. The number of entries used depends on 
the number of active processes, texts, and mounts. 

• iblocks— PDP-11/70 only: specifies how many “inode block 
address cache” entries to allocate. An entry is needed for each 
regular, directory, or fifo file that is open. Since special files 
do not need an entry, data space can be conserved by 


3-14 



SETUP 


specifying a much smaller number of iblocks than inodes. The 
default value is four less than the number of inodes. 

files— Specifies how many “open-file table” entries to allocate. 
Each entry represents an open file. When the table overflows, 
the warning message “no file” will be printed on the console. 
The table size should be increased if this happens regularly. 

mounts— Specifies how many “mount table” entries to allocate. 
Each entry represents a mounted file system. The root (/) will 
always be the first entry. When full, the mount(2) system call 
will return the error EBUSY. 

coremap— PDP-11/70 only: specifies how many entries to 
allocate to the “list of free memory”. Each entry represents a 
contiguous group of 64-byte blocks of free memory. When the 
list overflows, due to excessive fragmentation, the system will 
print a warning message on the console. This condition results 
in the loss of memory being freed. The system should be 
remade with a larger table size and rebooted. The number of 
entries used depends on the number of processes active, their 
sizes, and the amount of memory available. 

swapmap— Specifies how many entries to allocate to the “list 
of free swap blocks”. Exactly like the coremap, except it 
represents free blocks in the swap area in 512-byte units. 

calls— Specifies how many “call-out table” entries to allocate. 
Each entry represents a function to be invoked at a later time 
by the clock handler. The time unit is 1/60 of a second. The 
call-out table is used by the terminal handlers to provide 
terminal delays and by various other I/O handlers. When the 
table overflows, the system will crash and print the panic 
message “Timeout table overflow” on the console. This value 
must be greater than two. 

procs— Specifies how many “process table” entries to allocate. 
Each entry represents an active process. The scheduler is 
always the first entry and init(lM) is always the second entry. 
The number of entries depends on the number of terminal lines 
available and the number of processes spawned by each user. 
The average number of processes per user is in the range of 2 
through 5. When full, the fork(2) system call will return the 
error EAGAIN. 


3-15 



SETUP 


• sxt—V AX only: Specifies how many sxt devices (used for job 
control) should be included in the system. The maximum value 
is 32. 

• texts— Specifies how many “text table” entries to allocate. 
Each entry represents an active read-only text segment. Such 
programs are created by using the — i or — n option of the 
loader ld(l). The -n option is implicit on the VAX. When the 
table overflows, the warning message “out of text” is printed 
on the console. 

• clists— Specifies how many “character list buffers” to allocate. 
On the PDP-11/70, each buffer contains up to 24 bytes; on the 
VAX, each buffer contains up to 64 bytes. The buffers are 
dynamically linked together to form input and output queues 
for the terminal lines and various other slow-speed devices. 
The average number of buffers needed per terminal line is in 
the range of 5 through 10. When full, input characters from 
terminals will be lost and not echoed. 

• maxproc— Specifies how many concurrent processes a 
nonsuperuser is allowed to run. 

• power— Specifies whether to attempt restart after a power 
failure. A value of 0 (default) indicates no restart; a value of 1 
attempts power-fail restart. On restart, device drivers are 
called and process 1 (i.e., init) is sent a hangup signal; see 
init(lM). 

• sema— Specifies whether to include semaphore code. A value 
of 0 (default) indicates no semaphores; a value of 1 includes 
semaphores. 

• shmem— Specifies whether to include shared memory code. A 
value of 0 (default) indicates no shared memory; a value of 1 
includes shared memory. 

• mesg— Specifies whether to include message code. A value of 0 
(default) indicates no messages; a value of 1 includes messages. 


3-16 



SETUP 


• shmmax— Specifies the maximum size of a shared memory 
segment. 

• shmmin— Specifies the minimum size of a shared memory 
segment. 

• shmmni— Specifies the maximum number of shared memory 
segments in the system. 

• shmseg— Specifies the maximum number of shared memory 
segments a user may have attached. 

• shmall— Specifies the maximum amount of shared memory 
that may be allocated system wide. The default value is 512 
clicks, 250 KB. 

• shmbrk— Specifies the number of clicks between the end of the 
data segment, and the beginning of the first shared memory 
segment if the default starting address is used allowing the 
user to continue to use sbrk(2) or brk(2). The default value is 
16 clicks, 8 KB. 

• msgmax— Specifies the maximum message size. 

• msgmnb— Specifies the maximum number of bytes on any one 
queue. 

• msgtql— Specifies the number of system message headers, i.e., 
maximum number of outstanding messages. 

• msgssz— Specifies the message segment size. Messages consist 
of a set of contiguous message segments large enough to fit the 
text. The segments are used to help eliminate fragmentation 
and speed message buffer allocation. A message may span 
several segments. 

• msgseg— Specifies the number of message segments in the 
system. 

• msgmap— Specifies the message segment map size. 

• msgmni— Specifies the maximum number of message queues 
system wide. The default is 10. 


3-17 



SETUP 


• semmap— Specifies the number of entries in the semaphore 
map. The map is used by the system to allocate and free 
semaphore sets. This parameter should be changed to reflect 
changes in semmns. 

• semmni— Specifies the number of semaphore identifiers, i.e., 
number of semaphore sets. 

• semmns— Specifies the number of semaphores in the system. 

• semmnu— Specifies the number of undo structures in the 
system. 

• semume— Specifies the maximum number of undo entries per 
structure. 

• semmsl— Specifies the maximum number of semaphores per 
semaphore identifier. 

• semopm— Specifies the maximum number of semaphore 
operations per semop(2) call. 

• maus— PDP-11/70 only: specifies whether to include maus 
(shared memory) code. A value of 0 (default) indicates no 
shared memory; a value of 1 includes the shared memory code. 

• vpmbsz— Specifies the amount of external buffer space (in 
bytes) available to the vpmt driver: on PDP-ll/70s, this space 
is allocated outside of the kernel address space. 


UNIX System Generation 

Before generating the first UNIX system, the system administrator 
should modify the file called Makefile in the /usr/src/uts/*/cf 
directory. This file contains five symbols that are used for system 
identification; it is used to initialize the internal utsname structure 
[see uname(l) and uname(2)]. The five symbols (eight characters 
maximum) are as follows: 

SYS System name (e.g., pwba) 


3-18 



SETUP 


NODE The name by which the system is known on the 

uucp(lC) network (e.g., pwba) 

REL The operating system release (e.g., 5.0 ) 

VER The current version of the system; this is 

usually four characters indicating when the 
system was made (e.g., 0620 for June 20) 

MACH The machine hardware name (e.g., vax-780). 

Generally, only the first two symbols need local modification; the 
REL symbol is a constant for the duration of this release, while the 
VER symbol will be defined when you make(l) the system. The 
name of the executable file produced by the generation procedure will 
be the concatenation of the SYS and VER symbols (e.g., pwba0620). 

To generate a new UNIX operating system, follow the procedure 
given: 


cd /usr/ src/uts/*/ cf 
ed dfile 
a 

[information as described above] 


w 

q 

config dfile 
make VER=ver 


The PDP-11/70 system has a relatively small address space. If the 
table sizes or the number of device types are too large, the program 
sysfix will print various error messages and the above procedure 
will only create an a.out file. In particular, the maximum available 
data space is 49,152 bytes. The actual data space requested can be 
found by using size(l) on a.out and adding the data and bss segment 
sizes. One then reduces the specified values for the various system 
entries until it all fits. The amount of space in the bss segment used 
for each entry is indicated in the part “UNIX System Configuration”. 


3-19 



SETUP 


On the VAX, the combined data space should not exceed 200,000 
bytes. 


The PDP-11/70 system is distributed with a special overlay loader 
that allows larger systems (text size greater than 64K) to be 
configured. This system will be made up of one main segment and 
seven overlay text segments that use supervisor registers to switch 
the overlay text segments. The main segment may be as large as 56K, 
and the overlay text segments may not be greater than 8K each. To 
invoke this new text scheme, edit /usr/src/uts/* /Makefile. To 
change the TYPE symbol from id to ov or when generating a new 
system, use the following procedure: 


cd /usr/src/uts/*/cf 
make VER=ver TYPE=ov 


The seven overlay text segments are described in the file 
/usr/src/uts/pdpll/ck/SEGF. This file consists of two parts (as 
shown below). 

cat.o DR11C_0 (First Part) 

csi.o CSI_0 
da.o DA11B_0 


x25s.o X25_0 

x25u.o X25_0 
* 

dmb.o dmr.o maus.o msg.o sem.o (Second Part) 
nsc.o kl.o sys.o vp.o 

vpmt.o trace. o lp.o err.o main.o ht.o dn.o 

x25u.o cat.o da.o du.o stl.o 

x25r.o nc.o st2.o 

x25s.o x25m.o dmk.o pcl.o 

csi.o tm.o 


The first part associates object files with configurable devices. The 
second part describes a possible layout of the seven overlay text 
segments. Not all of the devices associated with the object files 
described in some of the overlay segments may be configured 

3-20 



SETUP 


together because the overlay segment may become greater than 8K. 
If this happens, the second part of the file 
/usr/src/uts/pdpll/cf/SEGF will have to be modified. For example, 
the x25 driver and the pci driver may not be configured together 
since overlay segment six would be greater than 8K. To correct this, 
/usr/src/uts/pdpll/cf/SEGF will have to be edited. For example, if 
the nsc driver is not configured, the nsc.o file in overlay segment two 
may be replaced with pcl.o. The pcl.o file will have to be deleted 
from overlay segment six. The second part of 
/usr/src/uts/pdpll/cf/SEGF will then be as follows: 


dmb.o dmr.o maus.o msg.o sem.o 
pcl.o kl.o sys.o vp.o 

vpmt.o trace. o lp.o err.o main.o ht.o dn.o 

x25u.o cat.o da.o du.o stl.o 

x25r.o nc.o st2.o 

x25s.o x25m.o dmk.o 

csi.o tm.o 

After completing this, make a new UNIX system. The text size of 
each object file may be found using size(l). The object files may be 
extracted using ar(l) from /usr/src/uts/pdpll/libname where 
libname is the appropriate library. 

When you are satisfied with the new system, test it by the following 
procedure: 

cp /usr/src/uts/*/sysver / # sysver as above 

cd / 

umount /dev/dsk/Osl 
rm /unix 
In /sysver /unix 
sync 


Halt the processor and reboot the system. Note that this procedure 
results in two names for the operating system object; the generic 
/unix and the actual name, say /pwba0620. An old system may be 
booted by referring to the actual name, but remember that many 
programs [such as ps(l)] use the generic name /unix to obtain the 
name-list of the system. 


3-21 



SETUP 


If the new system does not work, verify that the correct device 
addresses and interrupt vectors have been specified. On the PDP- 
11/70, if the wrong interrupt vector and the correct device address 
have been specified for a device, the operating system will print the 
error message “stray interrupt at XXX” when the device is accessed, 
where XXX is the correct interrupt vector. On the VAX, the message 
is “stray UBA interrupt at XXX”. On the PDP-11/70, if the wrong 
device address is specified, the system will crash with a panic trap of 
type 0 (indicating a time-out) when the device is accessed. On the 
VAX, the system will not crash directly but will print a “UBA” 
warning message indicating the failing address. 


For the VAX, a stand-alone test may be executed to compare the 
configuration in /unix with the actual hardware. After the stand- 
alone shell prompts type /stand/vcf. The vcf(lM) command 

reports on the MASSBUS adapter configuration and attempts to 
determine the UNIBUS system device configuration. Any errors 
should be corrected before continuing. 


Special Files 

A special file must be made for every device on your system. 
Normally, all special files are located in the directory /dev. Initially, 
this directory will contain: 


console 

error 

mem, kmem, null 
tty 

rp[0-7], rrp[0-7] 
rl [0-1 ] , rrl[0-l] 
rk[0-l], rrk[0-l] 
mtO, rmtO 
mt4, rmt4 


console terminal 
see err(7) 
see mem(7) 
see tty(7) 

disk drive 0, sections 0-7 
disk drives 0 and 1 
disk drives 0 and 1 
tape drive 0 

tape drive 0 (no rewind). 


These special files are of two types— block and character. This is 
indicated by the character b or c in the listing produced by ls(l) with 
the —1 option. 


In addition, each special file has a major device number and a minor 
device number. The major device number refers to the device type 


3-22 



SETUP 


and is used as an index into either the bdevsw or cdevsw table in the 
configuration file conf.c. The minor device number refers to a 
particular unit of the device type and is used only by the driver for 
that type. The config program with the — t option will list major 
device numbers. 


The program mknod(lM) creates special files. For example, the 
following would create part of the initially-supplied /dev directory 
(on the PDP-11/70): 


cd /dev 

mknod console c 0 0 
mknod error c 20 0 

mknod mem c 2 0; mknod kmem c 2 1; mknod null c 2 2 
mknod tty c 13 0 

mknod dsk/OsO b 0 0; mknod rdsk/OsO c 7 0 
mknod mt/Om b 1 0; mknod rmt/Om c 6 0 
mknod mt/Omn b 1 4; mknod rmt/Omn c 6 4 


After the special files have been made, their access modes should be 
changed to appropriate values by chmod(l). For example: 


cd /dev 

chmod 622 console 

chmod 444 error 

chmod 440 mem kmem 

chmod 666 null 

chmod 666 tty 

chmod 400 dsk/OsO rdsk/OsO 

chmod 666 mt/Om rmt/Om 

chmod 666 mt/Omn rmt/Omn 


Note that file names have no meaning to the operating system itself; 
only the major and minor device numbers are important. However, 
many programs expect that a particular file is a certain device. 
Thus, by convention, special files are named as follows: 


block device 

conf.c 

/dev 

RP03 disk 

rp 

dsk/* 

RP04/5/6 disk 

hp 

dsk/* 

RS03/4 fixed head disk 

hs 

dsk/* 


3-23 



SETUP 


RK05 disk 
RL01/2 disk 
RM05 disk 
RM80 disk 
RP07 disk 
general disk 
TU10 tape 
TE/TU16 tape 
TS11 tape 
TU78 tape 
general tape 

character device 

DL11 async. line 

DH11 async. line mux 

DMCll sync, unit 

DZ11 async. line mux 

DN11 auto call unit 

DU11 sync, line 

KMC11-B micro 

DM11-BA modem control 

DZ11/KMC11-B assist 

LP11 line printer 

RP03 disk 

RP04/5/6 disk 

RM05 disk 

RM80 disk 

RP07 disk 

general disk 

RS03/4 fixed head disk 

TU10 tape 

TE/TU16 tape 

TS11 tape 

TU78 tape 

general tape 

error 

memory 

terminal 


rk 

dsk/* 

rl 

dsk/* 

hm 

dsk/* 

gd 

dsk/* 

gd 

dsk/* 

gd 

dsk/* 

tm 

mt/* 

ht 

mt/* 

ts 

mt/* 

ts 

mt/* 

gt 

mt/* 

conf.c 

/dev 

kl 

tty*, console 

dh 

tty* 

dmc 

dmc* 

dz 

tty* 

dn 

dn* 

du 

du* 

kmc 

kmc* 

dmk 

dmk* 

dzb 

tty* 

lp 

lp* 

rp 

rdsk/* 

hp 

rdsk/* 

hm 

rdsk/* 

gt 

rdsk/* 

gt 

rdsk/* 

gd 

rdsk/ * 

hs 

rdsk/* 

tm 

rmt/* 

ht 

rmt/* 

ts 

rmt/* 

gt 

rmt/* 

gt 

rmt/* 

err 

error 

mm 

mem, kmem, null 

sy 

tty 


See the section " New Tape and Disk Device Names" for more 
information on /dev entry naming conventions for tape and disk 
devices. 


3-24 



SETUP 


For those devices with a /dev name ending in this character is 
replaced by a string of digits representing the minor device number. 
For example: 

mknod /dev/dsk/2s4 b 0 024 # leading zero means octal 
mknod /dev/tty03 c 1 3 

Note that for disks, an octal number scheme is maintained because 
each drive is logically partitioned into eight sections. Thus, 
/dev/dsk/2s4 refers to section 4 of physical drive 2. There is also a 
special file, /dev/swap, that is used by the program ps(l). This file 
must reflect the block device used for swapping and must be 
readable. For example: 


rm /dev/swap 
mknod /dev/swap b 0 0 
chmod 440 /dev/swap 

chown sys /dev/ swap; chgrp sys /dev/ swap 


The minor device numbers for tapes are also encoded; the minor 
device number consists of the four bits shown in Figure 3-2. 


DENSITY 

0=REWIND 

DRIVE-SELECT 

DRIVE-SELECT 

SELECT 

1 = NO REWIND 

BIT 1 

BIT 0 


Figure 3-2. Minor Device Number, Tape 


For the TU10 and TE16, the density select bit is 0 for 800 bpi and 1 
for 1600 bpi. For the TU78, the density select bit is 0 for 1600 bpi 
and 1 for 6250 bpi. For the TS11, the density select bit is ignored; the 
density is always 1600 bpi. Therefore, the special file for tape drive 
1, for operation at 1600-bpi, with no rewind on close would have a 
minor device number of 13 (binary 1101). The 2-bit drive number 
field permits a maximum of four drives per system. 


The minor numbers of the automatic calling unit (ACU) interface, 
DN11, are encoded in eight bits as shown in Figure 3-3. 


3-25 




SETUP 


SHARED-ACU-SELECT 
FOUR BITS 


SYSTEM-UNIT-SELECT 
DN11-AA 
TWO BITS 


LINE-UNIT-SELECT 
DN11-DA 
TWO BITS 


Figure 3-3. Minor Device Number, DN11 

The shared- ACU-select field in normally zero, which indicates the 
standard 801 ACU hardware (one ACU, one phone line). For the new 
shared ACU hardware (one ACU, several phone lines), the shared- 
ACU-select field indicates the line to be used. 


File Systems 

Each physical pack is partitioned into eight logical sections. This 
partitioning is defined in the operating system by a table with eight 
entries. Each table entry is two words long. The first specifies how 
many blocks are in the section; the second specifies the starting 
cylinder; see hp(7) (RP04/5/6), hm(7) (RM05), rp07(7) (RP07), 
rm80(7) (RM80), and rp(7) (RP03) for default cylinder and block 
assignments. These values are described to the system in the header 
file /usr/include/sys/io.h. 

A file system starts at block 0 of a section of the disk and may be as 
large as the size of that section; if it is smaller than the size of a 
section, the remainder of that section is unused. Note that the 
sections themselves may overlap physical areas of the pack, but the 
file systems must never overlap. 

The program mkfs(lM) (for IK byte/block file systems) or omkfs 
(for 512-byte/block file systems) is used to initialize a section of the 
disk to be a file system. The length of each section of the disk is 
specified in 512-byte sectors. When mkfs is used, it produces half 
the number of IK byte file system blocks. The IK byte blocks 
provide better throughput for the particular file system. A 512- 
byte/block file system can be made using omkfs in place of mkfs. 
The number of physical disk sectors (512 bytes each) used to make 
the entire file system would be the same for either command. In 
future releases, the functions of mkfs and omkfs will be merged 
into one command. 


3-26 




SETUP 


Next, the program labelit(lM) is used to label the file system with 
its name and the name of the pack. Finally, the file system may be 
checked for consistency by using fsck(lM). The file system may then 
be mounted using mount(lM). 


Job Control 

Job control is an optional feature for the VAX processor. The feature 
consists of a user level command, shl(l), and a driver, sxt(7), that 
supports the command. By using shl(l) a user can interact with 
different shells from a single terminal. The shl(l) command can be 
used to maintain different environments (i.e., different working 
directories, different effective user ids, etc.) or allow a user to leave 
and return to a foreground process. 


Installation 

1. The system administrator should verify the shl(l) command has 
been installed in the system. 

2. The system description file should contain the following line: 

sxt 0 0 0 n 

where n is the number of sxt devices desired. The maximum 
value is 32. 


3-27 



SETUP 


3. The job control feature requires sxt devices under /dev. The sxt 
devices should be created in /dev using the following shell script: 

major=(the major number found in /etc/master for sxt) 

minor=0 

cd /dev 

mkdir sxt 

chmod 755 sxt 

for link in 00 01 02 03 04 (... the number of devices desired) 
do 

for chan in 01234567 
do 

echo ${minor} ${link}${chan} 

mknod sxt${link}${chan} c ${major} ${minor} 

In sxt${link}${chan} sxt/${link}${chan} 
minor=‘expr ${minor} + 1‘ 
done 
done 


For each device, some operating kernel space is required. If too many 
devices are requested, the error message " SXT cannot allocate link 
buffers" may appear on the console the first time shl is used. If this 
occurs, either increase the size of the kernel or decrease the requested 
number of sxt devices. 


DZ11 Software with KMC11-B Assist 

KMC11-B microprocessors may be used to control DZ11 asynchronous 
multiplexers, thus off-loading terminal-oriented functions from the 
CPU. The KMC11-B does DMA input and output of data, character 
translations, tab expansions, etc. Each KMC11 can control up to four 
DZ11 multiplexers. Each system can support up to four KMC11 
microprocessors that are used for DZ11 assist for a total of 128 lines. 


Installation 

1. Generate a system by including each DZ11 to be controlled by a 
KMC11 in the system description file. For example, 

dzb X Y Z 


3-28 



SETUP 


where X is the vector address, Y is the UNIBUS address, and Z is 
the bus request level. Also include the KMC11 microprocessors 
in the configuration file: 

kmcll X Y Z 

2. Update the file /etc/brc to execute dzbset before entering one of 
the numbered init states. For each KMC11 used to control 
DZlls, include 

/etc/dzbset /dev/kmc? [ttyname] 

where ? is the minor device number of that KMC11 and 
ttyname is any tty device name that has the DZ11 unit number 
to associated encoded in the minor device number. For example, 

/etc/dzbset /dev/kmcO /dev/ttyOO /dev/tty08 

associates KMC11 minor device 0 with DZ11 units 0 and 1. The 
DZ11 number is specified by the order of appearance in the 
configuration file. The first four DZ11 multiplexers must be 
associated with one KMC11 and the next four must be associated 
with another KMC11. The order in which the KMC11 
microprocessors are specified is not significant. 

3. Update the files /etc/brc and /etc/powerfail to execute dzbload 
before entering one of the numbered init states and after a power 
failure, respectively. Each KMC11 used to control a DZ11 must 
be loaded with microcode. For each KMC11 used to control a 
DZ11, include 

/etc/dzbload /dev/kmc? 

where ? is the minor device number of that KMC11. The 
dzbload command may also be executed while the system is 
running when an administrator deems it necessary to reload a 
KMC11 with microcode. 

4. Special files must be created for each KMC and DZ11 line. 


3-29 



SETUP 


/etc/mknod /dev/kmc? c X 1 
/etc/mknod /dev/tty?? c Y Z 


where X is the major device number of the KMC11 and ? is the 
minor device number of the KMC11 controlling the DZ11 
multiplexers, i.e., the KMC11 loaded with microcode in Step 2. Y 
is the major device number of the dzb device as is supplied by 
config(lM). Z is the minor device number for a particular line 
on a DZ11. This number is composed of two fields. The low- 
order three bits are the line number relative to a DZ11. The next 
four bits contain the unit number of the DZ11 controlling these 
lines. Note that this number is the absolute DZ11 number, not 
the number relative to the KMC11. For example, 


mknod /dev/ tty?? c Y 041 


specifies the second line (0 through 7) on the fifth DZ11. The 
DZ11 number is specified by the order of appearance in the 
configuration file. 

Virtual Protocol Machine (VPM) Devices 

KMC11-B microprocessors may be used to control DMC11-DA, -FA, 
-MD, or DMR11 single-line synchronous communications interfaces or 
DMS11-DA 8-line synchronous communications multiplexers (one per 
KMC). The combination of a KMC11 and a DMS11-DA is known as a 
KMS11. A DM11-BA modem control multiplexer is normally included 
with the KMS11 hardware. 


The KMC11 executes the link-level communications protocol and 
performs DMA data transfers between main memory and the 
communications device. A link-level protocol description in a high- 
level language is compiled on the host computer [see vpmc(lM)] and 
then down-loaded into the KMC11 by the vpmstart(lM) command. 


The common synchronous interface (CSI) contains a set of utility 
routines for communicating with the VPM interpreter in the KMC11. 
The VPM protocol driver provides a transparent user interface to the 
VPM interpreter. The CSI routines are used not only by the VPM 
protocol driver but also by several other protocol drivers such as the 
BX.25 Level 3 driver. 


3-30 



SETUP 


Installation 

1. Generate a system. The system description file must contain a 
line of the form 

vpm 0 0 0 n 

where n is the number of VPM protocol-driver minor devices 
desired, and a line of the form 

csi 0 0 0 m 

where m is the number of CSI minor devices desired. One CSI 
minor device is required for each VPM protocol-driver minor 
device. If other drivers that use CSI are configured, additional 
CSI minor devices must be configured to support these drivers. 
The system description file must also contain a line of the form 

trace 0 0 0 k 

where k is the number of trace minor devices desired (normally 
one). Vpmbsz, a configurable parameter which gives the size of 
the buffer pool available to vpm, may be adjusted to be different 
from the default value found in /etc/master. Refer to master(4) 
for more information. 

2. Compile the protocol script using vpmc(lM) 

vpmc [-m] [-f] [-i hdlc[/kms]] -o script.o script.c 


where script.c is the source file for the protocol description. The 
-m option is used only if the protocol source requires expansion 
with the m4(l) preprocessor; otherwise, the C preprocessor is 
used. The -f option is used only if the protocol source is written 
in ratfor; otherwise, the script is assumed to be written in C 
language. The -i hdlc option is used to compile a script that 


3-31 



SETUP 


uses the bit-oriented (HDLC) version of the VPM interpreter; 
otherwise, the character-oriented (BISYNC) version of the 
interpreter will be used. If the script is to be loaded into a 
KMS11 8-line multiplexer rather than into a KMC11 controlling a 
single-line interface, the — i hdlc/kms option or the — i 
bisync/kms option is needed. 

3. Install the compiled protocol description in a convenient 
directory such as /lib and modify the /etc/brc file to load the 
compiled protocol description into the selected KMClls each time 
the system is rebooted (first transition to multiuser) using 
vpmstart(lM). If power-fail recovery is desired, the 
/etc/powerfail file should also be modified to reload the compiled 
protocol description into the selected KMC after a power fail 
occurs. The — r option to vpmstart is required when reloading 
the KMC after a power fail. 

4. Special files must be created for each VPM protocol-driver minor 
device, each trace minor device, each KMC11, and each DM11 (if 
used) 


/etc/mknod 

/etc/mknod 

/etc/mknod 

/etc/mknod 


/dev/vpm? 
/ dev/ trace 
/dev/kmc? 
/dev/dmk? 


c W ? 

c X 0 

c Y ? 

c Z ? 


where W, X, Y, and Z are the major device numbers of the VPM 
protocol driver, trace driver, KMC11 driver, and DM11 driver, 
respectively. (It is not necessary to make special files for the 
CSI minor devices since these are never referenced explicitly). 

5. Modify the /etc/brc file so that it executes a vpmset(lM) 
command for each VPM protocol-driver minor device. The 
arguments to the vpmset command specify the protocol-driver 
minor device, the KMC11, and, if the device is a KMS11, the 
particular line on the KMS11 


/etc/vpmset /dev/vpm? /dev/kmc? [n] 

KMC11 lines are numbered starting with zero. The vpmset 
commands should be executed on the first transition to 


3-32 



SETUP 


multiuser. An attempt to open a VPM protocol-driver minor 
device for reading and/or writing will fail if the corresponding 
vpmset command has not yet been executed. 

6. If a DM11 modem control multiplexer is included with the 
KMS11 hardware, then for each DM11 configured in the system 
include the following line in the file /etc/brc: 


/etc/dmkset dmkname kmcname 


where dmkname is the device name of the DM11 and kmcname is 
the device name of the KMS11 with which the DM11 is 
associated. 


Hardware Installation and Switch Settings 

The KMC11-B microprocessor and DMCll-DA, -FA, -MD, or DMR11 
line unit must be installed in adjacent slots of a PDP-11/70 or VAX 
backplane. Care should be taken not to exceed the dc power capacity 
of the cabinet in which the items are installed. The microprocessor 
and line unit are interconnected by a 1-foot ribbon cable. The 
DMCll-DA, -FA, or DMR11 line unit is connected to a suitable 
synchronous modem by a DEC-supplied modem cable. If the HDLC 
interpreter is used, the modem must be optioned for full-duplex (4- 
wire) operation; at speeds above 1200 bits per second, this will 
normally require a private line. If the BISYNC interpreter is used, 
then the modem may be optioned for half-duplex or full-duplex 
operation; however, the modem’s clear-to-send lead must change state 
in response to assertion and lowering of request-to-send by the 
DMC11 or DMR11 (i.e. the modem must not be optioned to keep 
clear-to-send high). If the modem is constrained to keep clear-to- 
send high, then the request-to-send lead on the DMC11 or DMR11 
must be looped around to the clear-to-send of the DMC11 or DMR11 
to ensure its proper operation. The DMCll-DA has an RS-232 
interface that is suitable for connection to data sets such as the 208 
and 209. The DMC11-FA has a CCITT V.35 interface. The DMR11 
line unit can be configured for a RS-232, RS-422-A, RS-423-A, or V.35 
interface. The DMC11-MD has an integral 56 KB modem; this unit 
must be connected by a pair of coaxial cables to another DMC11-MD. 
The device address and interrupt vector address switches on the 
KMC11 should be set for the selected addresses. The KMC11 should 
also be wired for the selected bus priority (normally 5). All switches 

3-33 



SETUP 


and jumpers on the DMC11 line unit should be in the normal 
configuration prescribed by the relevant DEC maintenance manual, 
but with one exception— the NO CRC switch (switch S2 in switch 
pack number 1) should be in the ON position. The purpose of this 
switch setting is to inhibit hardware CRC generation. Hardware 
CRC generation is not used with the VPM software for this device. 
Switches for the DMR11 line unit should be set according to the DEC 
installation manual for the interface being used. Speed selection 
should also be compatible for the chosen interface. Unlike the 
DMC11, the DMR11 line unit does not have a switch setting that 
inhibits the CRC clear. This is performed by the VPM software. 


The procedures for installing a KMS11 are similar except that 

1. There are no switches to set on the DMS11-DA 8-line 
multiplexer. 

2. Modem cables are not supplied with the DMS11-DA. Instead, 
there is an 8-line distribution panel with an RS-232 connector for 
each line on the DMS11-DA. Each in-use line must be connected 
by a suitable cable to a synchronous modem. 

3. The DM11-BA modem control multiplexer is normally included 
as part of the KMS11 hardware package. If a KMS11 contains a 
DM11-BA, then that KMS11 may execute either character- 
oriented (BISYNC) or bit-oriented (HDLC) protocols. The data- 
terminal-ready and clear-to-send leads to the modem are 
controlled by the VPM software, via the DM11-BA. (In such 
cases, dmkset must be performed prior to down loading the 
compiled protocol into the KMS11.) If a DM11-BA is not 
included, then that KMS11 may execute only bit-oriented 
protocols. In this case, the modem connected to each in-use line 
should be optioned to hold the transmitter carrier continuously 
on. 

The KMC11 contains a hardware timer that is used by both the VPM 
and DZ11/KMC11 software. For proper operation with this software, 
the value of the hardware timer should be 75 microseconds. Newer 
versions of the KMC11 microprocessor (etch revisions E and later) 
contain a switch pack at location E82 that is used to select the value 
of the program timer. E82-8 ON provides a time-out value of 115 
milliseconds; E82-8 OFF yields the correct value of 75 microseconds. 

3-34 



SETUP 


(E82 1 through 7 are not used.) On earlier versions of the KMC11, the 
time-out value is selected by installing the proper value of capacitor 
C40. For proper operation with VPM and DZ11/KMC11 software, the 
value of this capacitor should be 4700 pF. 


ADMINISTRATIVE FILES 


/etc/motd 

This file contains the message-of-the-day. It is printed by 
/etc/profile after every successful login ; therefore, it should be kept 
short and to the point. 


/etc/brc 

This file is executed prior to entering any of the numbered init states 
for the first time after a reboot. The file is generally used to clear 
the file /etc/mnttab and load KMClls with their respective scripts. 
It is important to remember this file is executed once per reboot and 
is controlled by /etc/inittab. 


/etc/powerfail 

This shell script is executed according to its line in /etc/inittab. It is 
used primarily to reload KMCll scripts after a power failure has 
been detected. 


/etc/rc 

On the transition between init states, /etc/init executes the shell 
script /etc/rc (which must have executable modes). The execution of 
this file is controlled by a line in /etc/inittab. For /etc/rc to 
properly handle the mounting and unmounting of file systems, the 
opening of tty lines, etc., it may need certain information that is 
present in /etc/utmp, namely, the new (current) state, the number of 
times this state has previously been entered, and the previous state. 


3-35 



SETUP 


The following shell script fragment assigns this data to shell 
variables and checks for entering init state ‘2’ for the first time. As 
an example: 


set ‘who -r‘ 
cur_mode=$7 
no_times=$8 
pre_mode=$9 

if [ ${cur_mode} = 2 ) -a ${no_times} = 0 ) ] 
then 

# commands to be executed when entering multiuser mode 
fi 


Note that these values are carried over between reboots, that is to 
say, when the system is rebooted into a single user initial state, these 
values, stored in /etc/utmp will reflect the state the system was in 
when it last went down. 

Note that the files /etc/rc, /etc/brc, /etc/inittab, /etc/powerfail , and 
/etc/shutdown must be edited to suit local conditions; see brc(lM). 


/etc/inittab 

This file is used by /etc/init to determine the processes to create or 
terminate in each init state. By convention, state ‘s’ is single user 
and state ‘2’ is multiuser. 


The following line may be used to indicate the default init state, that 
is, the state the system is to come up in (most likely single user): 


is:s:initdefault: 

The following lines arrange for appropriate execution of /etc/brc, 
/etc/rc, and /etc/powerfail: 

bc::bootwait:/etc/brc 1> /dev/console 2>&1 
rc::wait:/etc/rc 1> /dev/console 2>&1 
pf::powerfail:/etc/powerfail 1> /dev/console 2>&1 


3-36 



SETUP 


For line /dev/ttyOO for use by 1200-baud asynchronous terminals, add 
the following: 

00:2:respawn:/etc/getty -t60 ttyOO 1200 

The arguments to getty are the number of seconds to allow before 
hanging up the line, the device name, optional speed settings which 
refers to an entry in /etc/gettydefs, optional type of terminal 
referenced in getty(lM), and optional line discipline. 

To add or delete getty-login processes while the system is in 
multiuser mode, make the appropriate changes to /etc/inittab then 
issue the command /etc/init q. This forces /etc/init to reread 
/etc/inittab without having to change init states. 


Again, this file must be edited for local conditions; see getty(8), 

stgetty(8), init(8), gettydefs(4), and inittab(5). 

/etc/passwd 

This file is used to describe each user to the system. A new line must 

be added for each new user. Each line has seven fields separated by 

colons: 

1. Login name: normally 1 to 8 characters, first character 
alphabetic, the remainder alphanumeric, no uppercase 
characters. 

2. Encrypted password: initially null, filled in by passwd(l). The 
encrypted password contains 13 bytes while the actual password 
is limited to a maximum of 8 bytes. The encrypted password 
may be followed by a comma and up to four more bytes of 
password “age” information. 

3. User ID: a number between 0 and 65,535; 0 indicates the 
superuser. User IDs 0 through 99 are reserved. 

4. Group ID: the default is group 1 (one). Group IDs 0 through 99 
are reserved. 

5. Accounting information: this field is used by various accounting 
programs. It usually contains the user name, department 
number, and account number. 


3-37 



SETUP 


6. Login directory: full pathname (keep them reasonably short). 

7. Program name: if null, /bin/sh is invoked after a successful 
login. If present, the named program is invoked in place of 
/bin/sh . 

For example, 

ghh::138:l:6824-G.H.Hurtz(4357):/usr/ghh: 

grk::244:l:6510-S.P.LeName(4466):/usr/grk:/bin/rsh 

See also passwd(4), login(l), and passwd(l). 


/etc/group 

This file is used to describe each group to the system. The system 
administrator must add a new line for each new group. Each line has 
four fields separated by colons: 

1. Group name: normally 1 to 8 characters, first character 
alphabetic, the remainder alphanumeric, no uppercase 
characters. 

2. Encrypted password: contains 13 bytes while the actual password 
is limited to a maximum of 8 bytes. 

3. Group ID: a number between 0 and 65,535. Group IDs 0 through 
99 are reserved. 

4. Login names: list of all login names in the group, separated by 
commas; list of all login names that may use newgrp(l) to 
become a member of the group. 


Group passwords are strongly discouraged. See also group(4). 


3-38 



SETUP 


/etc/profile 

When the shell is executed and is the leader of a process group, as is 
the case when it is invoked by login, it will read and execute the 
commands in /etc/profile before executing commands in the user’s 
.profile file. This allows the system administrator to set up a 
standard environment for all users (e.g., executing umask, setting 
shell variables, etc.) and take care of other housekeeping details (such 
as news — n). Note that in /etc/profile the shell variable $0 
indicates the invocation— normal shell (-sh), restricted shell (-rsh), 
or su command (-su). 


/etc/checklist 

This file contains a list of default devices to be checked for 
consistency by the fsck(lM) program. The devices normally 
correspond to those mounted when the system is in multiuser mode. 
For example, a sample checklist would be 


/dev/dsk/OsO 

/dev/rdsk/Osl 


Note that the root device is specified as a block device while all 
others are specified as character devices. Character devices can be 
checked faster than block devices. The root device is specified as a 
block device in order for the fsck program to detect when the root is 
being checked, so that any modifications to this file system will 
result in an immediate reboot request. 


/etc/shutdown 

This file contains procedures to gracefully shut down the system in 
preparation for file save or scheduled downtime. Beware that no 
procedures appear after the transition to single user mode as it may 
not be completed before the transition occurs. 


3-39 



SETUP 


/etc/filesave and /etc/tapesave 

These files contain prototypes for local file saves. 

/usr/adm/pacct 

This file contains the process accounting information; see acct(lM). 
/etc/wtmp 

This file is the log of login processes. 


REGENERATING SYSTEM SOFTWARE 

System source is issued under the directory /usr/src. The 
subdirectories are named cmd (commands), lib (libraries), uts (the 
operating system), head (header files), and stand (stand-alone 
programs); see mk(8) for details on how to remake system software. 


3-40 



Chapter 4 

AUTO CALL FACILITY INSTALLATION 
(DEC PROCESSORS) 

GENERAL 

Before using the procedures outlined in this chapter, the 
administrator should be familiar with the procedures and commands 
used to install a UNIX system and should have read the “SETTING 
UP THE UNIX SYSTEM” chapter of this guide. 

The UNIX system has the capability to utilize an 801-type Auto Call 
Unit (ACU) for setting up a facility for dialing-up and 
communicating with other systems. The user level interface for the 
device is the command cu. The system may also use the ACU for 
networking via the uucp(lC) programs [See cu(lC) and uucp(lC) in 
the UNIX System V User Reference Manual]. A system 
administrator, however, must first make certain software 
adjustments to the system to enable the device to be used. These 
adjustments are listed below. 


PROCEDURES 

The following step-by-step procedures should ensure a properly 
software installed automatic call-up facility. When installing more 
than one facility, adjust the instructions accordingly. An ACU 
interface (DN-11 line card), an 801-type ACU, and a modem per 
facility (except as noted below for AT&T 57A1 and 57B1 data units 
for shared ACU setups) is needed. 

1. The ACU must first be optioned correctly and associated with a 
dedicated dial-in port. For the following examples, /dev/ttyOO 
will be the dedicated dial-in port. 

The 801-type ACU contains the following switches and should be 
optioned as shown below. A 0 = open and a 1 = closed (i.e., side 
next to number is down); bracketed switches are missing on some 
models. 


4-1 



ACU 


51 = 1000[1] 

52 = 0101 

53 = 11010 

54 = 11 [00] 


When using 212-type data sets, ensure the optioning of these 
also. 


51 = [0]001 

52 = 110001100 

53 = 11110000 

54 = 00 


When using the shared ACU facility, the 57B1 data unit will 
have to be properly optioned. If the 57B1 data unit is the only 
sharing circuit in an arrangement or the first in an arrangement 
using two data mountings, all the positions of switches S9 and 
S10 must be open. If the 57B1 is the second sharing circuit in an 
arrangement using two data mountings, all positions of switches 
S9 and S10 must be closed. There are no customer selected 
options. 

2. All make-busy (MB) and service-line (SL) switches on the data 
mountings should be in the up or off position (i.e., do not busy 
them out). The make-busy switches on the 57Bls should be in 
the off position for the slots with data sets and in the make-busy 
position for unequipped or unused slots. 

To use the 212A data set at high speed, the high-speed button 
must be depressed. This is not software switchable. If high speed 
is chosen, any UNIX system called must have 1200-baud answer 
capability or connection will not be possible. 

3. Edit /etc/imttab and turn the getty process off for multiuser 
state (e.g., state 2) by changing the flags field to off. 

00:2:off:/ etc/getty ttyOO 1200 # 


This will not allow anyone to dial into this dedicated port. 


4-2 



ACU 


4. Include the ACU interface device driver entry in your system 
description file with the appropriate unit information; for 
example: 

dnll 370 175200 4 1 

5. Note the major device number generated by config -t. It will 
usually be a seven on VAX systems and a four on PDP-11 UNIX 
systems. 

6. Using the major device number, make the following device 
entries in the /dev directory. The name dnO is commonly used 
with DEC UNIX systems. 

# mknod /dev/dnO c 7 0 

# In /dev/dnO /dev/cuaO 

# In /dev/ttyOO /dev/culO 

The linked names of culO and cuaO are simply aliases of the 
access and line devices that are known by the cu and uucp 
commands. 

Some configurations may incorporate a shared ACU capability 
(57A1 and 57B1 data units) that enables one ACU interface 
device and one ACU to dial numbers and establish connections 
for up to 12 phone lines (data sets). 

In this configuration, the data sets are located in two mounting 
racks— six to a rack. The data sets are numbered 1 through 6 in 
the first rack and 9 through 14 in the second rack. The data set 
number should be reflected in the high order 4 bits of the minor 
device number of the ACU interface device entries in /dev. 

As a comparison, to install four ACU facilities on a UNIX system 
of a DEC processor would have previously required four DN-11 
line cards, four 801-type ACUs, four data sets, and the following 
nodes: 


ENTRY Minor Device # 
/dev/dnO 0 

/dev/dnl 1 

/dev/dn2 2 

/dev/dn3 3 


4-3 



ACU 


With the shared ACU facility, only one DN-11 line card and one 
ACU are required. Up to 12 data sets can be serviced and their 
nodes would be: 


ENTRY 

Minor 

Device # 

/dev/dnO 

020 

(16) 

/dev/dnl 

040 

(32) 

/ dev/dn2 

060 

(48) 

/dev/ dn3 

0100 

(64) 

/dev/dn4 

0120 

(80) 

/dev/dn5 

0140 

(96) 

/ dev/dn6 

0220 

(144) 

/dev/dn7 

0240 

(160) 

/dev/dn8 

0260 

(176) 

/dev/dn9 

0300 

(192) 

/dev/dnlO 

0320 

(208) 

/dev/dnl 1 

0340 

(224) 


7. Change the modes on the two devices to read/write by all. 

# chmod 666 /dev/cuaO /dev/culO 

Note that the modes of the aliases of culO and cua.0 are changed 
automatically because they are links. 

8. Ensure that an appropriate entry exists in the file 
/ usr/lib/uucp/L-devices. 

ACU culO cuaO 300 

If the high speed was chosen on the 212A data set, this line 
should have the speed 1200-baud instead of 300-baud. 

9. After completing the above steps, make a new operating system, 
install it as /unix and reboot the system. 


Note: The above are only examples and should not 
necessarily be copied directly for your system. 


4-4 



ACU 


DIAGNOSING PROBLEMS 

If the above steps are followed precisely and the unit still does not 
work, the hardware should be checked out. Problems should be 
diagnosed in the following order: 

1 Ensure that the lock file (/usr/spool/uucp/LCK...) is not present 
from earlier failed attempts with cu or uucp. 

2. Perform the self-tests on the data sets, the ACUs, and the 
sharing hardware (these tests are described in literature that 
comes with the devices). 

3. Verify that both the ACU and the data set are correctly optioned 
as described above. 

4. Check that the ACU interface (DN-11) is pulsing digits to the 
ACU. 

5. Determine that the ACU is dialing the correct number. 

6. Ensure the integrity of the data set by using it as a dial-up port. 

7. Determine that the cable leads are not defective. 


4-5 



ACU 


NOTES 


4-6 



Chapter 5 

UNIX SYSTEM ACCOUNTING 

The UNIX system accounting provides methods to collect per-process 
resource utilization data, record connect sessions, monitor disk 
utilization, and charge fees to specific logins. A set of C language 
programs and shell procedures is provided to reduce this accounting 
data into summary files and reports. This chapter describes the 
structure, implementation, and management of this accounting 
system, as well as a discussion of the reports generated and the 
meaning of the columnar data. 


Throughout this chapter, each reference of the form name(lM), 
name(7), or name(8) refers to entries in the UNIX System V 
Administrator Reference Manual. References to entries of the form 
name(N), where " N" is the number 1 or 6 possibly followed by a 
letter, refer to entry name in section N of the UNIX System V User 
Reference Manual. If " N" is a number (2 through 5) possibly 
followed by a letter, refer to entry name in section N of the UNIX 
System V Programmer Reference Manual. 


GENERAL 

The following list is a synopsis of the actions of the accounting 
system: 

• At process termination, the UNIX system kernel writes one 
record per process in /usr/adm/pacct in the form of acct.h. 

• The login and init programs record connect sessions by 
writing records into /etc/wtmp. Date changes, reboots, and 
shutdowns (via acctwtmp) are also recorded in this file. 

• The disk utilization program acctdusg and diskusg break 
down disk usage by login. 

• Fees for file restores, etc., can be charged to specific logins 
with the chargefee shell procedure. 


5-1 



ACCOUNTING 


• Each day the runacct shell procedure is executed via cron to 
reduce accounting data and produce summary files and reports. 

• The monacct procedure can be executed on a monthly or fiscal 
period basis. It saves and restarts summary files, generates a 
report, and cleans up the sum directory. These saved summary 
files could be used to charge users for UNIX system usage. 


FILES AND DIRECTORIES 

The /usr/lib/acct directory contains all of the C language programs 
and shell procedures necessary to run the accounting system. The 
adm login (currently user ID of 4) is used by the accounting system 
and has the login directory structure shown in Figure 5-1. 


/usr/adm 

I 

I 

acct 

! 

i i i 

i i i 

nite sum fiscal 


Figure 5-1. Directory Structure of the “adm” Login 


The /usr/adm directory contains the active data collection files. (For 
a complete explanation of the files used by the accounting system, see 
Figure 5-2 at the end of this section.) The nite directory contains files 
that are reused daily by the runacct procedure. The sum directory 
contains the cumulative summary files updated by runacct. The 
fiscal directory contains periodic summary files created by monacct. 


5-2 



ACCOUNTING 


DAILY OPERATION 

When the UNIX system is switched into multiuser mode, 
/usr/lib/acct/startup is executed which does the following: 

1. The acctwtmp program adds a “boot” record to /etc/wtmp. 
This record is signified by using the system name as the login 
name in the wtmp record. 

2. Process accounting is started via turnacct. Turnacct on 
executes the accton program with the argument 
/usr/adm/pacct. 

3. The remove shell procedure is executed to clean up the saved 
pacct and wtmp files left in the sum directory by runacct. 

The ckpacct procedure is run via cron every hour of the day to 
check the size of /usr/adm/pacct. If the file grows past 1000 blocks 
(default), turnacct switch is executed. The advantage of having 
several smaller pacct files becomes apparent when trying to restart 
runacct after a failure processing these records. 

The chargefee program can be used to bill users for file restores, 
etc. It adds records to /usr/adm/fee which are picked up and 
processed by the next execution of runacct and merged into the total 
accounting records. 

Runacct is executed via cron each night. It processes the active 
accounting files, /usr/adm/pacct, /etc/wtmp, 

/usr/adm/acct/nite/disktacct, and /usr/adm/fee. It produces 
command summaries and usage summaries by login. 


When the system is shut down using shutdown, the shutacct shell 
procedure is executed. It writes a shutdown reason record into 
/etc/wtmp and turns process accounting off. 

After the first reboot each morning, the computer operator should 
execute /usr/lib/acct/prdaily to print the previous day’s accounting 
report. 


5-3 



ACCOUNTING 


SETTING UP THE ACCOUNTING SYSTEM 

In order to automate the operation of this accounting system, several 
things need to be done: 

1. If not already present, add this line to the /etc/rc file in the 
state 2 section: 


/bin/su - adm -c /usr/lib/acct/startup 

2. If not already present, add this line to /etc/shutdown to turn off 
the accounting before the system is brought down: 

/usr/lib/ acct/ shutacct 


3. For most installations, the following three entries should be 
made in /usr/spool/cron/crontab/adm so that cron will 
automatically run the daily accounting. 


0 4 * * 1-6 /usr/lib/acct/runacct 2>/usr/adm/acct/nite/fd21og 
0 2**4 /usr/lib/acct/dodisk 
5 * * * * /usr/lib/acct/ckpacct 


4. To facilitate monthly merging of accounting data, the following 
entry in /usr/spool/cron/crontab/adm will allow monacct to 
clean up all daily reports and daily total accounting files and 
deposit one monthly total report and one monthly total 
accounting file in the fiscal directory. 

15 5 1 * * /usr/lib/acct/monacct 


The above entry takes advantage of the default action of 
monacct that uses the current month’s date as the suffix for the 
file names. Notice that the entry is executed at such a time as to 
allow runacct sufficient time to complete. This will, on the first 
day of each month, create monthly accounting files with the 
entire month’s data. 


5-4 



ACCOUNTING 


5. The PATH shell variable should be set in /usr/adm/.profile to: 


PATH=/usr/lib/acct:/bin:/usr/bin 


RUNACCT 

Runacct is the main daily accounting shell procedure. It is normally 
initiated via cron during nonprime time hours. Runacct processes 
connect, fee, disk, and process accounting files. It also prepares daily 
and cumulative summary files for use by prdaily or for billing 
purposes. The following files produced by runacct are of particular 
interest: 


nite/lineuse 


nite/daytacct 


sum/ tacct 


sum/daycms 


sum/cms 


Produced by acctcon, reads the wtmp file, and 
produces usage statistics for each terminal line 
on the system. This report is especially useful 
for detecting bad lines. If the ratio between the 
number of logoffs to logins exceeds about 3/1, 
there is a good possibility that the line is 
failing. 

This file is the total accounting file for the 
previous day in tacct.h format. 

This file is the accumulation of each day’s 
nite/daytacct and can be used for billing 
purposes. It is restarted each month or fiscal 
period by the monacct procedure. 

Produced by the acctcms program. It contains 
the daily command summary. The ASCII 
version of this file is nite/daycms. 

The accumulation of each day’s command 
summaries. It is restarted by the execution of 
monacct. The ASCII version is nite/cms. 


sum/loginlog Produced by the lastlogin shell procedure. It 
maintains a record of the last time each login 
was used. 


5-5 



ACCOUNTING 


sum/rprtMMDD Each execution of runacct saves a copy of the 
daily report that can be printed by prdaily. 


Runacct takes care not to damage files in the event of errors. A 
series of protection mechanisms are used that attempt to recognize 
an error, provide intelligent diagnostics, and terminate processing in 
such a way that runacct can be restarted with minimal intervention. 
It records its progress by writing descriptive messages into the file 
active. (Files used by runacct are assumed to be in the nite 
directory unless otherwise noted.) All diagnostics output during the 
execution of runacct is written into fd21og. Runacct will complain 
if the files lock and lockl exist when invoked. The lastdate file 
contains the month and day runacct was last invoked and is used to 
prevent more than one execution per day. If runacct detects an 
error, a message is written to /dev/console, mail is sent to root and 
adm, locks are removed, diagnostic files are saved, and execution is 
terminated. 


In order to allow runacct to be restartable, processing is broken 
down into separate reentrant states. A file is used to remember the 
last state completed. When each state completes, statefile is updated 
to reflect the next state. After processing for the state is complete, 
statefile is read and the next state is processed. When runacct 
reaches the CLEANUP state, it removes the locks and terminates. 
States are executed as follows: 


SETUP 


WTMPFIX 


CONNECTl 


The command turnacct switch is executed. 
The process accounting files, /usr/adm/pacct?, 
are moved to /usr/adm/Spacct?.MMDD. The 
/etc/wtmp file is moved to 
/usr/adm/acct/nite/wtmp.MMDD with the 
current time added on the end. 

The wtmp file in the nite directory is checked 
for correctness by the wtmpfix program. 
Some date changes will cause acctconl to fail, 
so wtmpfix attempts to adjust the time stamps 
in the wtmp file if a date change record 
appears. 

Connect session records are written to ctmp in 
the form of ctmp.h. The lineuse file is created, 


5-6 



ACCOUNTING 


C0NNECT2 

PROCESS 


MERGE 

FEES 

DISK 

MERGETACCT 

CMS 

USEREXIT 

CLEANUP 


and the reboots file is created showing all of the 
boot records found in the wtmp file. 

Ctmp is converted to ctacct.MMDD which are 
connect accounting records. (Accounting 
records are in tacct.h format.) 

The acctprcl and acctprc2 programs are 
used to convert the process accounting files, 
/usr/adm/Spacct?.MMDD, into total accounting 
records in ptacct?.MMDD. The Spacct and 
ptacct files are correlated by number so that if 
runacct fails the unnecessary reprocessing of 
Spacct files will not occur. One precaution 
should be noted; when restarting runacct in 
this state, remove the last ptacct file because it 
will not be complete. 

Merge the process accounting records with the 
connect accounting records to form daytacct. 

Merge in any ASCII tacct records from the file 
fee into daytacct. 

On the day after the dodisk procedure runs, 
merge disktacct with daytacct. 

Merge daytacct with sum/tacct, the cumulative 
total accounting file. Each day, daytacct is 
saved in sum/tacctMMDD, so that sum/tacct 
can be recreated in the event it becomes 
corrupted or lost. 

Merge in today’s command summary with the 
cumulative command summary file sum/cms. 
Produce ASCII and internal format command 
summary files. 

Any installation dependent (local) accounting 
programs can be included here. 

Clean up temporary files, run prdaily and save 
its output in sum/rprtMMDD, remove the locks, 
then exit. 


5-7 



ACCOUNTING 


RECOVERING FROM FAILURE 

The runacct procedure can fail for a variety of reasons; usually due 
to a system crash, /usr running out of space, or a corrupted wtmp 
file. If the activeMMDD file exists, check it first for error messages. 
If the active file and lock files exist, check fd21og for any mysterious 
messages. The following are error messages produced by runacct 
and the recommended recovery actions: 

ERROR: locks found, run aborted 

The files lock and lockl were found. These files must be removed 
before runacct can restart. 


ERROR: acctg already run for date : check /usr/adm/acct/nite/lastdate 

The date in lastdate and today’s date are the same. Remove 
lastdate. 

ERROR: turnacct switch returned rc= ? 


Check the integrity of turnacct and accton. The accton 
program must be owned by root and have the setuid bit set. 

ERROR: Spacct 7.MMDD already exists 

File setups probably already run. Check status of files, then run 
setups manually. 

ERROR: /usr/adm/acct/nite/wtmp.MMZTD already exists, run setup manually 
Self-explanatory. 


5-8 



ACCOUNTING 


ERROR: wtmpfix errors see /usr/adm/acct/nite/wtmperror 

Wtmpfix detected a corrupted wtmp file. Use fwtmp to correct 
the corrupted file. 

ERROR: connect acctg failed: check /usr/adm/acct/nite/log 


The acctconl program encountered a bad wtmp file. Use 
fwtmp to correct the bad file. 

ERROR: Invalid state, check /usr/adm/acct/nite/active 

The file statefile is probably corrupted. Check statefile and read 
active before restarting. 


RESTARTING RUNACCT 

Runacct called without arguments assumes that this is the first 
invocation of the day. The argument MMDD is necessary if runacct 
is being restarted and specifies the month and day for which 
runacct will rerun the accounting. The entry point for processing is 
based on the contents of statefile. To override statefile, include the 
desired state on the command line. For example: 


To start runacct: 

nohup runacct 2> /usr/adm/acct/nite/fd21og& 

To restart runacct: 

nohup runacct 0601 2> /usr/adm/acct/nite/fd21og& 

To restart runacct at a specific state: 

nohup runacct 0601 WTMPFIX 2> /usr/adm/acct/nite/fd21og& 


5-9 



ACCOUNTING 


FIXING CORRUPTED FILES 

Unfortunately, this accounting system is not entirely foolproof. 
Occasionally, a file will become corrupted or lost. Some of the files 
can simply be ignored or restored from the file save backup. 
However, certain files must be fixed in order to maintain the 
integrity of the accounting system. 


Fixing WTMP Errors 

The wtmp files seem to cause the most problems in the day-to-day 
operation of the accounting system. When the date is changed and 
the UNIX system is in multiuser mode, a set of date change records 
is written into /etc/wtmp. The wtmpfix program is designed to 
adjust the time stamps in the wtmp records when a date change is 
encountered. However, some combinations of date changes and 
reboots will slip through wtmpfix and cause acctconl to fail. The 
following steps show how to patch up a wtmp file. 


cd /usr/adm/acct/nite 
fwtmp < wtmp .MMDD > xwtmp 
ed xwtmp 

delete corrupted records or 

delete all records from beginning up to the date change 
fwtmp -ic < xwtmp > wtmp .MMDD 


If the wtmp file is beyond repair, create a null wtmp file. This will 
prevent any charging of connect time. Acctprcl will not be able to 
determine which login owned a particular process, but it will be 
charged to the login that is first in the password file for that user id. 


Fixing TACCT Errors 

If the installation is using the accounting system to charge users for 
system resources, the integrity of sum/tacct is quite important. 
Occasionally, mysterious tacct records will appear with negative 
numbers, duplicate user IDs, or a user ID of 65,535. First check 
sum/tacctprev with prtacct. If it looks all right, the latest 


5-10 



ACCOUNTING 


sum/tacct.MMDD should be patched up, then sum/tacct recreated. A 
simple patchup procedure would be: 


cd /usr/adm/acct/sum 

acctmerg -v < tacct .MMDD > xtacct 

ed xtacct 

remove the bad records 
write duplicate uid records to another file 
acctmerg -i < xtacct > tacct .MMDD 
acctmerg tacctprev < tacct .MMDD > tacct 


Remember that the monacct procedure removes all the tacct.MMDD 
files; therefore, sum/tacct can be recreated by merging these files 
together. 


UPDATING HOLIDAYS 

The file /usr/lib/acct/holidays contains the prime/ nonprime table for 
the accounting system. The table should be edited to reflect your 
location’s holiday schedule for the year. The format is composed of 
three types of entries: 

1. Comment Lines: Comment lines may appear anywhere in the file 
as long as the first character in the line is an asterisk. 

2. Year Designation Line: This line should be the first data line 
(noncomment line) in the file and must appear only once. The 
line consists of three fields of four digits each (leading white 
space is ignored). For example, to specify the year as 1982, prime 
time at 9:00 a.m., and nonprime time at 4:30 p.m., the following 
entry would be appropriate: 

1982 0900 1630 


A special condition allowed for in the time field is that the time 
2400 is automatically converted to 0000. 

3. Company Holidays Lines: These entries follow the year 
designation line and have the following general format: 


day-of-year Month Day Description of Holiday 


5-11 



ACCOUNTING 


The day-of-year field is a number in the range of 1 through 366 
indicating the day for the corresponding holiday (leading white 
space is ignored). The other three fields are actually commentary 
and are not currently used by other programs. 


DAILY REPORTS 

Runacct generates five basic reports upon each invocation. They 
cover the areas of connect accounting, usage by person on a daily 
basis, command usage reported by daily and monthly totals, and a 
report of the last time users were logged in. 


The following paragraphs describe the reports and the meanings of 
their tabulated data. 


Daily Report 

In the first part of the report, the from/to banner should alert the 
administrator to the period reported on. The times are the time the 
last accounting report was generated until the time the current 
accounting report was generated. It is followed by a log of system 
reboots, shutdowns, power fail recoveries, and any other record 
dumped into /etc/wtmp by the acctwtmp program [see acct(lM) in 
the UNIX System V Administrator Reference Manual ]. 


The second part of the report is a breakdown of line utilization. The 
TOTAL DURATION tells how long the system was in multiuser state 
(able to be accessed through the terminal lines). The columns are: 


LINE 

The terminal line or access port. 

MINUTES 

The total number of minutes that line was in 
use during the accounting period. 

PERCENT 

The total number of MINUTES the line was in 
use divided into the TOTAL DURATION. 

# SESS 

The number of times this port was accessed for 
a login(l) session. 


5-12 



ACCOUNTING 


# ON This column does not have much meaning 

anymore. It used to give the number of times 
that the port was used to log a user on; but 
since login(l) can no longer be executed 
explicitly to log in a new user, this column 
should be identical with SESS. 

# OFF This column reflects not just the number of 

times a user logged off but also any interrupts 
that occur on that line. Generally, interrupts 
occur on a port when the getty(lM) is first 
invoked when the system is brought to 
multiuser state. Where this column does come 
into play is when the # OFF exceeds the # ON 
by a large factor. This usually indicates that 
the multiplexer, modem, or cable is going bad, 
or there is a bad connection somewhere. The 
most common cause of this is an unconnected 
cable dangling from the multiplexer. 


During real time, /etc/wtmp should be monitored as this is the file 
that the connect accounting is geared from. If it grows rapidly, 
execute acctconl to see which tty line is the noisest. If the 
interrupting is occurring at a furious rate, general system 
performance will be effected. 


Daily Usage Report 

This report gives a by-user breakdown of system resource utilization. 
Its data consists of: 


UID 

LOGIN NAME 


CPU (MINS) 


The user ID. 

The login name of the user; there can be 
more than one login name for a single user 
ID, this identifies which one. 

This represents the amount of time the 
user’s process used the central processing 
unit. This category is broken down into 
PRIME and NPRIME (nonprime) 
utilization. The accounting system’s idea 

5-13 



ACCOUNTING 


KCORE-MINS 

CONNECT (MINS) 


DISK BLOCKS 

# OF PROCS 

# OF SESS 

# DISK SAMPLES 


of this breakdown is located in the 
/usr/lib/acct/holidays file. As delivered, 
prime time is defined to be 0900 through 
1700 hours. 

This represents a cumulative measure of 
the amount of memory a process uses 
while running. The amount shown reflects 
kilobyte segments of memory used per 
minute. This measurement is also broken 
down into PRIME and NPRIME amounts. 

This identifies “Real Time” used. What 
this column really identifies is the amount 
of time that a user was logged into the 
system. If this time is rather high and the 
column “# OF PROCS” is low, this user is 
what is called a “line hog”. That is, this 
person logs in first thing in the morning 
and does not hardly touch the terminal the 
rest of the day. Watch out for these kinds 
of users. This column is also subdivided 
into PRIME and NPRIME utilization. 

When the disk accounting programs have 
been run, the output is merged into the 
total accounting record ( tacct.h ) and shows 
up in this column. This disk accounting is 
accomplished by the program acctdusg. 

This column reflects the number of 
processes that was invoked by the user. 
This is a good column to watch for large 
numbers indicating that a user may have a 
shell procedure that runs amock. 

This is how many times the user logged 
onto the system. 

This indicates how many times the disk 
accounting was run to obtain the average 
number of DISK BLOCKS listed earlier. 


5-14 



ACCOUNTING 


PEE An often unused field in the total 

accounting record, the FEE field 
represents the total accumulation of 
widgets charged against the user by the 
chargefee shell procedure [see 
acctsh(lM)]. The chargefee procedure 
is used to levy charges against a user for 
special services performed such as file 
restores, etc. 


Daily Command and Monthly Total Command Summaries 

These two reports are virtually the same except that the Daily 
Command Summary only reports on the current accounting period 
while the Monthly Total Command Summary tells the story for the 
start of the fiscal period to the current date. In other words, the 
monthly report reflects the data accumulated since the last 
invocation of monacct. 


The data included in these reports gives an administrator an idea as 
to the heaviest used commands and, based on those commands’ 
characteristics of system resource utilization, a hint as to what to 
weigh more heavily when system tuning. 

These reports are sorted by TOTAL KCOREMIN, which is an 
arbitrary yardstick but often a good one for calculating " drain" on a 
system. 

COMMAND NAME This is the name of the command. 

Unfortunately, all shell procedures are 
lumped together under the name sh since 
only object modules are reported by the 
process accounting system. The 
administrator should monitor the 
frequency of programs called a.out or 
core or any other name that does not 
seem quite right. Often people like to 
work on their favorite version of 
backgammon only they do not want 
everyone to know about it. Acctcom is 
also a good tool to use for determining 


5-15 



ACCOUNTING 


NUMBER CMDS 

TOTAL KCOREMIN 

TOTAL CPU-MIN 

TOTAL REAL-MIN 

MEAN SIZE-K 

MEAN CPU-MIN 

HOG FACTOR 

CHARS TRNSFD 

BLOCKS READ 


who executed a suspiciously named 
command and also if superuser privileges 
were used. 

This is the total number of invocations of 
this particular command. 

The total cumulative measurement of the 
amount of kilobyte segments of memory 
used by a process per minute of run time. 

The total processing time this program 
has accumulated. 

The total real-time (wall-clock) minutes 
this program has accumulated. This total 
is the actual “waited for” time as opposed 
to kicking off a process in the background. 

This is the mean of the TOTAL 
KCOREMIN over the number of 

invocations reflected by NUMBER CMDS. 

This is the mean derived between the 
NUMBER CMDS and TOTAL CPU-MIN. 

This is a relative measurement of the ratio 
of system availability to system 
utilization. It is computed by the formula 

(total CPU time) / (elapsed time) 

This gives a relative measure of the total 
available CPU time consumed by the 
process during its execution. 

This column, which may go negative, is a 
total count of the number of characters 
pushed around by the read(2) and 
write(2) system calls. 

A total count of the physical block reads 
and writes that a process performed. 


5-16 



ACCOUNTING 


Last Login 

This report simply gives the date when a particular login was last 
used. This could be a good source for finding likely candidates for 
the archives or getting rid of unused logins and login directories. 


SUMMARY 

The UNIX system accounting was designed from a UNIX system 
administrator’s point of view. Every possible precaution has been 
taken to ensure that the system will run smoothly and without error. 
It is important to become familiar with the C programs and shell 
procedures. The manual pages should be studied, and it is advisable 
to keep a printed copy of the shell procedures handy. The accounting 
system should be easy to maintain, provide valuable information for 
the administrator, and provide accurate breakdowns of the usage of 
system resources for charging purposes. 


5-17 



ACCOUNTING 


Files in the /usr/adm directory 


diskdiag 

diagnostic output during the execution of 
disk accounting programs 

dtmp 

output from the acctdusg program 

fee 

output from the chargefee program, ASCII 
tacct records 

pacct 

active process accounting file 

pacct? 

process accounting files switched via 
turnacct 

Spacct?.MMDD 

process accounting files for MMDD during 
execution of runacct 

Files in the /usr/adm/acct/nite directory 

active 

used by runacct to record progress and 
print warning and error messages, 
acti veMMDD same as active after 
runacct detects an error 

cms 

ASCII total command summary used by 
prdaily 

ctacct.MMDD 

connect accounting records in tacct.h 
format 

ctmp 

output of acctconl program, connect session 
records in ctmp.h format 

daycms 

ASCII daily command summary used by 
prdaily 

daytacct 

total accounting records for 1 day in tacct.h 
format 

Figure 5-2. 

Accounting System Files (Sheet 1 of 3) 


5-18 



ACCOUNTING 


disktacct disk accounting records in tacct.h format, 

created by dodisk procedure 

fd21og diagnostic output during execution of 

runacct (see cron entry) 

lastdate last day runacct executed in date +%m%d 

format 

lock lockl used to control serial use of runacct 

lineuse tty line usage report used by prdaily 

log diagnostic output from acctconl 

logMMDD same as log after runacct detects an error 

reboots contains beginning and ending dates from 

wtmp, and a listing of reboots 

statefile used to record current state during 

execution of runacct 

tmpwtmp wtmp file corrected by wtmpfix 

wtmperror place for wtmpfix error messages 

wtmperrorMMDD same as wtmperror after runacct detects 
an error 

wtmp.MMDD previous day’s wtmp file 

Files in the /usr/adm/acct/sum directory 

cms total command summary file for current 

fiscal in internal summary format 

cmsprev command summary file without latest 

update 

Figure 5-2. Accounting System Files (Sheet 2 of 3) 


5-19 



ACCOUNTING 


daycms 

loginlog 

pacct.MMDD 

rprtMMDD 

tacct 

tacctprev 

tacctMMDD 

wtmp.MMDD 

Files in the 

cms? 

fiscrpt? 

tacct? 


command summary file for yesterday in 
internal summary format 

created by lastlogin 

concatenated version of all pacct files for 
MMDD, removed after reboot by remove 
procedure 

saved output of prdaily program 

cumulative total accounting file for current 
fiscal 

same as tacct without latest update 

total accounting file for MMDD 

saved copy of wtmp file for MMDD, 
removed after reboot by remove procedure 

/usr/adm/acct/ fiscal directory 

total command summary file for fiscal ? in 
internal summary format 

report similar to prdaily for fiscal ? 

total accounting file for fiscal ? 


Figure 5-2. Accounting System Files (Sheet 3 of 3) 



Chapter 6 

FILE SYSTEM CHECKING 

The File System Check Program (fsck) is an interactive file system 
check and repair program. Fsck uses the redundant structural 
information in the UNIX system file system to perform several 
consistency checks. If an inconsistency is detected, it is reported to 
the operator, who may elect to fix or ignore each inconsistency. 
These inconsistencies result from the permanent interruption of the 
file system updates, which are performed every time a file is 
modified. Fsck is frequently able to repair corrupted file systems 
using procedures based upon the order in which the UNIX system 
honors these file system update requests. 


The purpose of this chapter is to describe the normal updating of the 
file system, to discuss the possible causes of file system corruption, 
and to present the corrective actions implemented by fsck. Both the 
program and the interaction between the program and the operator 
are described. 


Appendix 6-1 contains the fsck error conditions. The meanings of 
the various error conditions, possible responses, and related error 
conditions are explained. 


GENERAL 

When a UNIX operating system is brought up, a consistency check of 
the file systems should always be performed. This precautionary 
measure helps to ensure a reliable environment for file storage on 
disk. If an inconsistency is discovered, corrective action must be 
taken. 

The updating of the file system and file system corruption is 
described in this chapter. Finally, the set of heuristically sound 
corrective actions used by fsck are presented. 


6-1 



FSCK 


System Administrator Advice 

Remember that system buffers are 1024 bytes. When configuring the 
operating system, take into consideration that the same number of 
buffers as before will use more main memory. Weigh this against 
reducing the number of buffers, which reduces the cache hit ratio and 
degrades performance. 


UPDATE OF THE FILE SYSTEM 

Every working day hundreds of files are created, modified, and 
removed. Every time a file is modified, the UNIX operating system 
performs a series of file system updates. These updates, when 
written on disk, yield a consistent file system. To understand what 
happens in the event of a permanent interruption in this sequence, it 
is important to understand the order in which the update requests 
were probably being honored. Knowing which pieces of information 
were probably written to the file system first, heuristic procedures 
can be developed to repair a corrupted file system. 


There are five types of file system updates. These involve the 
superblock, inodes, indirect blocks, data blocks (directories and files), 
and free-list blocks. 


Superblock 

The superblock contains information about the size of the file system, 
the size of the inode list, part of the free-block list, the count of free 
blocks, the count of free inodes, and part of the free-inode list. 

The superblock of a mounted file system (the root file system is 
always mounted) is written to the file system whenever the file 
system is unmounted or a sync command is issued. 


6-2 



FSCK 


Inodes 

An inode contains information about the type of inode (directory, 
data, or special), the number of directory entries linked to the inode, 
the list of blocks claimed by the inode, and the size of the inode. 

An inode is written to the file system upon closure of the file 
associated with the inode. (All “in” core blocks are also written to 
the file system upon issue of a sync system call.) 


Indirect Blocks 

There are three types of indirect blocks— single-indirect, double- 
indirect, and triple-indirect. A single-indirect block contains a list of 
some of the block numbers claimed by an inode. Each one of the 128 
entries in an indirect block is a data-block number. A double-indirect 
block contains a list of single-indirect block numbers. A triple- 
indirect block contains a list of double-indirect block numbers. 


Indirect blocks are written to the file system whenever they have 
been modified and released by the operating system. More precisely, 
they are queued for eventual writing. Physical I/O is deferred until 
the buffer is needed by the UNIX system or a sync command is 
issued. 


Data Blocks 

A data block may contain file information or directory entries. Each 
directory entry consists of a file name and an inode number. 

Data blocks are written to the file system whenever they have been 
modified and released by the operating system. 


First Free-List Block 

The superblock contains the first free-list block. The free-list blocks 
are a list of all blocks that are not allocated to the superblock, inodes, 
indirect blocks, or data blocks. Each free-list block contains a count 
of the number of entries in this free-list block, a pointer to the next 
free-list block, and a partial list of free blocks in the file system. 


6-3 



FSCK 


Free-list blocks are written to the file system whenever they have 
been modified and released by the operating system. 


CORRUPTION OF THE FILE SYSTEM 

A file system can become corrupted in a variety of ways. Improper 
shutdown procedures and hardware failures are the most common. 


Improper System Shutdown and Startup 

File systems may become corrupted when proper shutdown 
procedures are not observed, e.g., forgetting to sync the system prior 
to halting the CPU, physically write-protecting a mounted file 
system, or taking a mounted file system off-line. 


File systems may also become further corrupted by allowing a 
corrupted file system to be used (and, thus, to be modified further). 


Hardware Failure 

Any piece of hardware can fail at any time. Failures can be as subtle 
as a bad block on a disk platter or as blatant as a nonfunctional disk 
controller. 


DETECTION AND CORRECTION OF 
CORRUPTION 

A quiescent file system (an unmounted system and not being written 
on) may be checked for structural integrity by performing 
consistency checks on the redundant data intrinsic to a file system. 
The redundant data is either read from the file system or computed 
from other known values. A quiescent state is important during the 
checking of a file system because of the multipass nature of the fsck 
program. 


When an inconsistency is discovered, fsck reports the inconsistency 
for the operator to chose a corrective action. 


6-4 



FSCK 


Discussed in this part are how to discover inconsistencies (and 
possible corrective actions) for the superblock, the inodes, the indirect 
blocks, the data blocks containing directory entries, and the free-list 
blocks. These corrective actions can be performed interactively by 
the fsck command under control of the operator. 


Superblock 

One of the most common corrupted items is the superblock. The 
superblock is prone to corruption because every change to the file 
system’s blocks or inodes modifies the superblock. 


The superblock and its associated parts are most often corrupted 
when the computer is halted and the last command involving output 
to the file system was not a sync command. 

The superblock can be checked for inconsistencies involving file 
system size, inode-list size, free-block list, free-block count, and the 
free-inode count. 


File System Size and Inode-List Size 

The file system size must be larger than the number of blocks used 
by the superblock and the number of blocks used by the list of inodes. 
The number of inodes must be less than 65,535. The file system size 
and inode-list size are critical pieces of information to the fsck 
program. While there is no way to actually check these sizes, fsck 
can check for them being within reasonable bounds. All other checks 
of the file system depend on the correctness of these sizes. 


Free-Block List 

The free-block list starts in the superblock and continues through the 
free-list blocks of the file system. Each free-list block can be checked 
for a list count out of range, for block numbers out of range, and for 
blocks already allocated within the file system. A check is made to 
see that all the blocks in the file system were found. 


The first free-block list is in the superblock. Fsck checks the list 
count for a value of less than 0 or greater than 50. It also checks 


6-5 



FSCK 


each block number for a value of less than the first data block in the 
file system or greater than the last block in the file system. Then it 
compares each block number to a list of already allocated blocks. If 
the free-list block pointer is nonzero, the next free-list block is read 
in and the process is repeated. 

When all the blocks have been accounted for, a check is made to see 
if the number of blocks used by the free-block list plus the number of 
blocks claimed by the inodes equals the total number of blocks in the 
file system. 

If anything is wrong with the free-block list, then fsck may rebuild 
the list, excluding all blocks in the list of allocated blocks. 


Free-Block Count 

The superblock contains a count of the total number of free blocks 
within the file system. Fsck compares this count to the number of 
blocks it found free within the file system. If the counts do not 
agree, then fsck may replace the count in the superblock by the 
actual free-block count. 


Free-Inode Count 

The superblock contains a count of the total number of free inodes 
within the file system. Fsck compares this count to the number of 
inodes it found free within the file system. If the counts do not 
agree, then fsck may replace the count in the superblock by the 
actual free-inode count. 


Inodes 

An individual inode is not as likely to be corrupted as the superblock. 
However, because of the great number of active inodes, there is 
almost as likely a chance for corruption in the inode list as in the 
superblock. 

The list of inodes is checked sequentially starting with inode 1 (there 
is no inode 0) and going to the last inode in the file system. Each 
inode can be checked for inconsistencies involving format and type, 
link count, duplicate blocks, bad blocks, and inode size. 


6-6 


I 



FSCK 


Format and Type 

Each inode contains a mode word. This mode word describes the type 
and state of the inode. Inodes may be one of four types: 


• Regular 

a Directory 

• Special block 

• Special character. 

If an inode is not one of these types, then the inode has an illegal 
type. Inodes may be found in one of three states— unallocated, 
allocated, and neither unallocated nor allocated. This last state 
indicates an incorrectly formatted inode. An inode can get in this 
state if bad data is written into the inode list through, for example, a 
hardware failure. The only possible corrective action is for fsck to 
clear the inode. 


Link Count 

Contained in each inode is a count of the total number of directory 
entries linked to the inode. Fsck verifies the link count of each 
inode by traversing down the total directory structure, starting from 
the root directory, and calculating an actual link count for each 
inode. 

If the stored link count is nonzero and the actual link count is zero, it 
means that no directory entry appears for the inode. If the stored 
and actual link counts are nonzero and unequal, a directory entry 
may have been added or removed without the inode being updated. 


If the stored link count is nonzero and the actual link count is zero, 
fsck can, under operator control, link the disconnected file to the 
lost+found directory. If the stored and actual link counts are 
nonzero and unequal, fsck can replace the stored link count by the 
actual link count. 


6-7 



FSCK 


Duplicate Blocks 

Contained in each inode is a list or pointers to lists (indirect blocks) 
of all the blocks claimed by the inode. Fsck compares each block 
number claimed by an inode to a list of already allocated blocks. If a 
block number is already claimed by another inode, the block number 
is added to a list of duplicate blocks. Otherwise, the list of allocated 
blocks is updated to include the block number. If there are any 
duplicate blocks, fsck will make a partial second pass of the inode 
list to find the inode of the duplicated block. This is necessary 
because without examining the files associated with these inodes for 
correct content there is not enough information available to decide 
which inode is corrupted and should be cleared. Most of the time, the 
inode with the earliest modify time is incorrect and should be 
cleared. This condition can occur by using a file system with blocks 
claimed by both the free-block list and by other parts of the file 
system. 

A large number of duplicate blocks in an inode may be due to an 
indirect block not being written to the file system. Fsck will prompt 
the operator to clear both inodes. 


Bad Blocks 

Contained in each inode is a list or pointer to lists of all the blocks 
claimed by the inode. Fsck checks each block number claimed by an 
inode for a value lower than that of the first data block or greater 
than the last block in the file system. If the block number is outside 
this range, the block number is a bad block number. 


If there is a large number of bad blocks in an inode, this may be due 
to an indirect block not being written to the file system. Fsck will 
prompt the operator to clear both inodes. 


6-8 



FSCK 


Size Checks 

Each inode contains a 32-bit (4-byte) size field. This size indicates 
the number of characters in the file associated with the inode. This 
size can be checked for inconsistencies, e.g., directory sizes that are 
not a multiple of 16 characters or the number of blocks actually used 
not matching that indicated by the inode size. 

A directory inode within the file system has the directory bit on in 
the inode mode word. The directory size must be a multiple of 16 
because a directory entry contains 16 bytes (2 bytes for the inode 
number and 14 bytes for the file or directory name). 


Fsck will warn of such directory misalignment. This is only a 
warning because not enough information can be gathered to correct 
the misalignment. 


A rough check of the consistency of the size field of an inode can be 
performed by computing from the size field the number of blocks that 
should be associated with the inode and comparing it to the actual 
number of blocks claimed by the inode. 

Fsck calculates the number of blocks that there should be in an 
inode by dividing the number of characters in an inode by the 
number of characters per block and rounding up. Fsck adds one 
block for each indirect block associated with the inode. If the actual 
number of blocks does not match the computed number of blocks, 
fsck will warn of a possible file-size error. This is only a warning 
because the UNIX system does not fill in blocks in files created in 
random order. 


Indirect Blocks 

Indirect blocks are owned by an inode. Therefore, inconsistencies in 
indirect blocks directly affect the inode that owns it. 

Inconsistencies that can be checked are blocks already claimed by 
another inode and block numbers outside the range of the file system. 


6-9 



FSCK 


For a discussion of detection and correction of the inconsistencies 
associated with indirect blocks, see parts “Duplicate Blocks” and 
“Bad Blocks”. 


Data Blocks 

The two types of data blocks are plain data blocks and directory data 
blocks. Plain data blocks contain the information stored in a file. 
Directory data blocks contain directory entries. Fsck does not 
attempt to check the validity of the contents of a plain data block. 


Each directory data block can be checked for inconsistencies 
involving directory inode numbers pointing to unallocated inodes, 
directory inode numbers greater than the number of inodes in the file 
system, incorrect directory inode numbers for and and 

directories disconnected from the file system. In addition, the 
validity of the contents of a directory’s data block is checked. 


If a directory entry inode number points to an unallocated inode, 
then fsck may remove that directory entry. This condition probably 
occurred because the data blocks containing the directory entries 
were modified and written out while the inode was not yet written 
out. 


If a directory entry inode number is pointing beyond the end of the 
inode list, fsck may remove that directory entry. This condition 
occurs if bad data is written into a directory data block. 


The directory inode number entry for should be the first entry in 
the directory data block. Its value should be equal to the inode 
number for the directory data block. 


The directory inode number entry for should be the second entry 
in the directory data block. Its value should be equal to the inode 
number for the parent of the directory entry (or the inode number of 
the directory data block if the directory is the root directory). 


If the directory inode numbers are incorrect, fsck may replace them 
with the correct values. 



FSCK 


Fsck checks the general connectivity of the file system. If 
directories are found not to be linked into the file system, fsck will 
link the directory back into the file system in the lost+found 
directory. This condition can be caused by inodes being written to 
the file system with the corresponding directory data blocks not 
being written to the file system. 


Free-List Blocks 

Free-list blocks are owned by the superblock. Therefore, 
inconsistencies in free-list blocks directly affect the superblock. 


Inconsistencies that can be checked are a list count outside of range, 
block numbers outside of range, and blocks already associated with 
the file system. 

For a discussion of detection and correction of the inconsistencies 
associated with free-list blocks, see part “Free-Block List”. 





APPENDIX 6-1 


APPENDIX 6-1 
FSCK ERROR CONDITIONS 
A. Conventions 

Fsck is a multipass file system check program. Each file system 
pass invokes a different phase of the fsck program. After the initial 
setup, fsck performs successive phases over each file system 
performing cleanup, checking blocks and sizes, pathnames, 
connectivity, reference counts, and the free-block list (possibly 
rebuilding it). 


When an inconsistency is detected, fsck reports the error condition 
to the operator. If a response is required, fsck prints a prompt 
message and waits for a response. This appendix explains the 
meaning of each error condition, the possible responses, and the 
related error conditions. 


The error conditions are organized by the “Phase” of the fsck 
program in which they can occur. The error conditions that may 
occur in more than one phase will be discussed under Part B. 

B. Initialization 

Before a file system check can be performed, certain tables have to be 
set up and certain files opened. This section describes the opening of 
files and the initialization of tables. Error conditions resulting from 
command line options, memory requests, opening of files, status of 
files, file system size checks, and creation of the scratch file are listed 
below. 

C option? 

C is not a legal option to fsck; legal options are -y, -n, -s, -S, -t, 
-r, — q, and — D. Fsck terminates on this error condition. See the 
fsck(lM) entry in the UNIX System V Administrator Reference 
Manual for further details. 

Bad — t option 

The -t option is not followed by a file name. Fsck terminates on 
this error condition. See the fsck(lM) entry in the UNIX System V 
Administrator Reference Manual for further details. 


A-l 



APPENDIX 6-1 


Invalid — s argument, defaults assumed 

The -s option is not suffixed by 3, 4, or blocks-per-cylinder:blocks- 
to-skip. Fsck assumes a default value of 400 blocks-per-cylinder and 
9 blocks-to-skip. See the fsck(lM) entry in the UNIX System V 
Administrator Reference Manual for further details. 

Incompatible options: — n and — s 

It is not possible to salvage the free-block list without modifying the 
file system. Fsck terminates on this error condition. See the 
fsck(lM) entry in the UNIX System V Administrator Reference 
Manual for further details. 

Can not fstat standard input 

Fsck’s attempt to fstat standard input failed. The occurrence of 
this error condition indicates a serious problem which may require 
additional assistance. Fsck terminates on this error condition. 

Can not get memory 

Fsck’s request for memory for its virtual memory tables failed. The 
occurrence of this error condition indicates a serious problem which 
may require additional assistance. Fsck terminates on this error 
condition. 

Can not open checkall file: F 

The default file system checkall file F (usually /etc/checkall) cannot 
be opened for reading. Fsck terminates on this error condition. 
Check access modes of F. 

Can not stat root 

Fsck’s request for statistics about the root directory “/” failed. The 
occurrence of this error condition indicates a serious problem which 
may require additional assistance. Fsck terminates on this error 
condition. 

Can not stat F 

Fsck’s request for statistics about the file system F failed. It 
ignores this file system and continues checking the next file system 
given. Check access modes of F. 



APPENDIX 6-1 


F is not a block or character device 

Fsck has been given a regular file name by mistake. It ignores this 
file system and continues checking the next file system given. Check 
file type of F. 

Can not open F 

The file system F cannot be opened for reading. It ignores this file 
system and continues checking the next file system given. Check 
access modes of F. 

Size check: fsize X isize Y 

More blocks are used for the inode list Y than there are blocks in the 
file system X, or there are more than 65,535 inodes in the file system. 
It ignores this file system and continues checking the next file system 
given. 

Can not create F 

Fsck’s request to create a scratch file F failed. It ignores this file 
system and continues checking the next file system given. Check 
access modes of F. 

CAN NOT SEEK: BLK B (CONTINUE) 

Fsck’s request for moving to a specified block number B in the file 
system failed. The occurrence of this error condition indicates a 
serious problem which may require additional assistance. 

Possible responses to CONTINUE prompt are: 

YES Attempt to continue to run file system check. 

Often, however, the problem will persist. This 
error condition will not allow a complete check 
of the file system. A second run of fsck should 
be made to recheck this file system. If block 
was part of the virtual memory buffer cache, 
fsck will terminate with the message “Fatal 
I/O error”. 

NO Terminate program. 


A-3 



APPENDIX 6-1 


CAN NOT READ: BLK B (CONTINUE) 

Fsck’s request for reading a specified block number B in the file 
system failed. The occurrence of this error condition indicates a 
serious problem which may require additional assistance. 

Possible responses to CONTINUE prompt are: 

YES Attempt to continue to run file system check. 

Often, however, the problem will persist. This 
error condition will not allow a complete check 
of the file system. A second run of fsck should 
be made to recheck this file system. If block 
was part of the virtual memory buffer cache, 
fsck will terminate with the message “Fatal 
I/O error”. 

NO Terminate program. 

CAN NOT WRITE: BLK B (CONTINUE) 

Fsck’s request for writing a specified block number B in the file 
system failed. The disk is write-protected. 


Possible responses to CONTINUE prompt are: 

YES Attempt to continue to run file system check. 

Often, however, the problem will persist. This 
error condition will not allow a complete check 
of the file system. A second run of fsck should 
be made to recheck this file system. If block 
was part of the virtual memory buffer cache, 
fsck will terminate with the message “Fatal 
I/O error”. 

NO Terminate program. 

C. PHASE 1: CHECK BLOCKS AND SIZES 

This phase concerns itself with the inode list. This part lists error 
conditions resulting from checking inode types, setting up the zero- 
link-count table, examining inode block numbers for bad or duplicate 
blocks, checking inode size, and checking inode format. 



APPENDIX 6-1 


UNKNOWN FILE TYPE 1=1 (CLEAR) 

The mode word of the inode I indicates that the inode is not a special 
character inode, regular inode, or directory inode. 


Possible responses to CLEAR prompt are: 

YES Deallocate inode / by zeroing its contents. This 

will always invoke the UNALLOCATED error 
condition in Phase 2 for each directory entry 
pointing to this inode. 

NO Ignore this error condition. 

LINK COUNT TABLE OVERFLOW (CONTINUE) 

An internal table for fsck containing allocated inodes with a link 
count of zero has no more room. Recompile fsck with a larger value 
of MAXLNCNT. 


Possible responses to CONTINUE prompt are: 


YES Continue with program. This error condition 

will not allow a complete check of the file 
system. A second run of fsck should be made 
to recheck this file system. If another allocated 
inode with a zero link count is found, this error 
condition is repeated. 

NO Terminate program. 

B BAD 1=1 

Inode / contains block number B with a number lower than the 
number of the first data block in the file system or greater than the 
number of the last block in the file system. This error condition may 
invoke the EXCESSIVE BAD BLKS error condition in Phase 1 if 
inode I has too many block numbers outside the file system range. 
This error condition will always invoke the BAD/DUP error 
condition in Phase 2 and Phase 4. 


A-5 



APPENDIX 6-1 


EXCESSIVE BAD BLKS 1=1 (CONTINUE) 

There is more than a tolerable number (usually 10) of blocks with a 
number lower than the number of the first data block in the file 
system or greater than the number of the last block in the file 
system associated with inode I. 

Possible responses to CONTINUE prompt are: 

YES Ignore the rest of the blocks in this inode and 

continue checking with next inode in the file 
system. This error condition will not allow a 
complete check of the file system. A second run 
of fsck should be made to recheck this file 
system. 

NO Terminate program. 

BDUP 1=1 

Inode I contains block number B which is already claimed by another 
inode. This error condition may invoke the EXCESSIVE DUP BLKS 
error condition in Phase 1 if inode I has too many block numbers 
claimed by other inodes. This error condition will always invoke 
Phase lb and the BAD/DUP error condition in Phase 2 and Phase 4. 

EXCESSIVE DUP BLKS 1=1 (CONTINUE) 

There is more than a tolerable number (usually 10) of blocks claimed 
by other inodes. 

Possible responses to CONTINUE prompt are: 

YES Ignore the rest of the blocks in this inode and 

continue checking with next inode in the file 
system. This error condition will not allow a 
complete check of the file system. A second run 
of fsck should be made to recheck this file 
system. 

NO Terminate program. 


A-6 



APPENDIX 6-1 


DUP TABLE OVERFLOW (CONTINUE) 

An internal table in fsck containing duplicate block numbers has no 
more room. Recompile fsck with a larger value of DUPTBLSIZE. 

Possible responses to CONTINUE prompt are: 

YES Continue with program. This error condition 

will not allow a complete check of the file 
system. A second run of fsck should be made 
to recheck this file system. If another duplicate 
block is found, this error condition will repeat. 

NO Terminate program. 

POSSIBLE FILE SIZE ERROR 1=1 

The inode I size does not match the actual number of blocks used by 
the inode. This is only a warning. If the — q option is used, this 
message is not printed. 

DIRECTORY MISALIGNED 1=1 

The size of a directory inode is not a multiple of the size of a 
directory entry (usually 16). This is only a warning. If the — q option 
is used, this message is not printed. 

PARTIALLY ALLOCATED INODE 1=1 (CLEAR) 

Inode I is neither allocated nor unallocated. 

Possible responses to CLEAR prompt are: 

YES Deallocate inode I by zeroing its contents. 

NO Ignore this error condition. 

D. PHASE IB: RESCAN FOR MORE DUPS 

When a duplicate block is found in the file system, the file system is 
rescanned to find the inode which previously claimed that block. 
This part lists the error condition when the duplicate block is found. 


A-7 



APPENDIX 6-1 


B DUP 1=1 

Inode I contains block number B which is already claimed by another 
inode. This error condition will always invoke the BAD/DUP error 
condition in Phase 2. Inodes with overlapping blocks may be 
determined by examining this error condition and the DUP error 
condition in Phase 1. 

E. PHASE 2: CHECK PATHNAMES 

This phase concerns itself with removing directory entries pointing to 
error conditioned inodes from Phase 1 and Phase lb. This part lists 
error conditions resulting from root inode mode and status, directory 
inode pointers in range, and directory entries pointing to bad inodes. 

ROOT INODE UNALLOCATED. TERMINATING 

The root inode (always inode number 2) has no allocate mode bits. 
The occurrence of this error condition indicates a serious problem 
which may require additional assistance. The program will 
terminate. 

ROOT INODE NOT DIRECTORY (FIX) 

The root inode (usually inode number 2) is not directory inode type. 


Possible responses to FIX prompt are: 

YES Replace the root inode’s type to be a directory. 

If the root inode’s data blocks are not directory 
blocks, a very large number of error conditions 
will be produced. 

NO Terminate program. 

DUPS/BAD IN ROOT INODE (CONTINUE) 

Phase 1 or Phase lb have found duplicate blocks or bad blocks in the 

root inode (usually inode number 2) for the file system. 

Possible responses to CONTINUE prompt are: 

YES Ignore DUPS/BAD error condition in root inode 

and attempt to continue to run the file system 
check. If root inode is not correct, then this 
may result in a large number of other error 
conditions. 

A-8 



APPENDIX 6-1 


NO Terminate program. 

I OUT OF RANGE 1=1 NAME=F (REMOVE) 

A directory entry F has an inode number I which is greater than the 
end of the inode list. 


Possible responses to REMOVE prompt are: 

YES The directory entry F is removed. 

NO Ignore this error condition. 

UNALLOCATED 1=1 0WNER=0 MODE=M SIZE=S 
MTIME=T NAME=F (REMOVE) 

A directory entry F has an inode I without allocate mode bits. The 
owner O, mode M, size S, modify time T, and file name F are printed. 
If the file system is not mounted and the — n option was not specified, 
the entry will be removed automatically if the inode it points to is 
character size 0. 


Possible responses to REMOVE prompt are: 

YES The directory entry F is removed. 

NO Ignore this error condition. 

DUP/BAD 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
DIR=F (REMOVE) 

Phase 1 or Phase lb have found duplicate blocks or bad blocks 
associated with directory entry F, directory inode I. The owner O, 
mode M, size S, modify time T, and directory name F are printed. 

Possible responses to REMOVE prompt are: 

YES The directory entry F is removed. 

Ignore this error condition. 


NO 


A-9 



APPENDIX 6-1 


DUP/BAD 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
FILE=F (REMOVE) 

Phase 1 or Phase lb have found duplicate blocks or bad blocks 
associated with directory entry F, inode I. The owner 0, mode M, 
size S, modify time T, and file name F are printed. 


Possible responses to REMOVE prompt are: 

YES The directory entry F is removed. 

NO Ignore this error condition. 

BAD BLK B IN DIR 1=1 0WNER=0 MODE=M SIZE=S 
MTIME=T 

This message only occurs when the — q option is used. A bad block 
was found in DIR inode I. Error conditions looked for in directory 
blocks are nonzero padded entries, inconsistent and entries, 
and imbedded slashes in the name field. This error message indicates 
that the user should at a later time either remove the directory inode 
if the entire block looks bad or change (or remove) those directory 
entries that look bad. 

F. PHASE 3: CHECK CONNECTIVITY 

This phase concerns itself with the directory connectivity seen in 
Phase 2. This part lists error conditions resulting from unreferenced 
directories and missing or full lost+found directories. 

UNREF DIR 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
(RECONNECT) 

The directory inode I was not connected to a directory entry when the 
file system was traversed. The owner 0, mode M, size S, and modify 
time T of directory inode I are printed. Fsck will force the 
reconnection of a nonempty directory. 


Possible responses to RECONNECT prompt are: 

YES Reconnect directory inode I to the file system in 

directory for lost files (usually lost+found). 
This may invoke lost+found error condition in 
Phase 3 if there are problems connecting 
directory inode I to lost+found. This may also 


A-10 



APPENDIX 6-1 


invoke CONNECTED error condition in Phase 3 
if link was successful. 

NO Ignore this error condition. This will always 

invoke UNREF error condition in Phase 4. 

SORRY. NO lost+found DIRECTORY 

There is no lost+found directory in the root directory of the file 
system; fsck ignores the request to link a directory in lost+found. 
This will always invoke the UNREF error condition in Phase 4. 
Check access modes of lost+found. See fsck(lM) in the UNIX 
System V Administrator Reference Manual for further details. 

SORRY. NO SPACE IN lost+found DIRECTORY 

There is no space to add another entry to the lost+found directory in 
the root directory of the file system; fsck ignores the request to link 
a directory in lost+found. This will always invoke the UNREF error 
condition in Phase 4. Clean out unnecessary entries in lost+found or 
make lost+found larger. See fsck(lM) in the UNIX System V 
Administrator Reference Manual for further details. 

DIR 1=11 CONNECTED. PARENT WAS 1=12 

This is an advisory message indicating a directory inode II was 
successfully connected to the lost+found directory. The parent inode 
12 of the directory inode II is replaced by the inode number of the 
lost+found directory. 

G. PHASE 4: CHECK REFERENCE COUNTS 

This phase concerns itself with the link count information seen in 
Phase 2 and Phase 3. This part lists error conditions resulting from 
unreferenced files; missing or full lost+found directory; incorrect link 
counts for files, directories, or special files; unreferenced files and 
directories; bad and duplicate blocks in files and directories; and 
incorrect total free-inode counts. 

UNREF FILE 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
(RECONNECT) 

Inode I was not connected to a directory entry when the file system 
was traversed. The owner O, mode M, size S, and modify time T of 
inode I are printed. If the — n option is not set and the file system is 
not mounted, empty files will not be reconnected and will be cleared 
automatically. 


A- 11 



APPENDIX 6-1 


Possible responses to RECONNECT prompt are: 

YES Reconnect inode I to file system in the directory 

for lost files (usually lost+found). This may 
invoke lost+found error condition in Phase 4 if 
there are problems connecting inode I to 
lost+found. 

NO Ignore this error condition. This will always 

invoke CLEAR error condition in Phase 4. 

SORRY. NO lost+found DIRECTORY 

There is no lost+found directory in the root directory of the file 
system; fsck ignores the request to link a file in lost+found. This 
will always invoke CLEAR error condition in Phase 4. Check access 
modes of lost+found. 

SORRY. NO SPACE IN lost+found DIRECTORY 

There is no space to add another entry to the lost+found directory in 
the root directory of the file system; fsck ignores the request to link 
a file in lost+found. This will always invoke the CLEAR error 
condition in Phase 4. Check size and contents of lost+found. 

(CLEAR) 

The inode mentioned in the immediately previous error condition 
cannot be reconnected. 


Possible responses to CLEAR prompt are: 

YES Deallocate inode mentioned in the immediately 

previous error condition by zeroing its contents. 

NO Ignore this error condition. 

LINK COUNT FILE 1=1 0WNER=0 MODE=M SIZE=S 
MTIME=T COUNT=X SHOULD BEY (ADJUST) 

The link count for inode I, which is a file, is X but should be Y. The 
owner 0, mode M, size S, and modify time T are printed. 


A-12 



APPENDIX 6-1 


Possible responses to ADJUST prompt are: 

YES Replace link count of file inode I with Y. 

NO Ignore this error condition. 

LINK COUNT DIR 1=1 OWNER=0 MODE=M SIZE=S 
MTIME=T COUNT=X SHOULD BE Y (ADJUST) 

The link count for inode /, which is a directory, is X but should be Y. 
The owner 0, mode M, size S, and modify time T of directory inode I 
are printed. 

Possible responses to ADJUST prompt are: 

YES Replace link count of directory inode / with Y. 

NO Ignore this error condition. 

LINK COUNT F 1=1 0WNER=0 MODE=M SIZE=S 
MTIME=T COUNT=X SHOULD BE Y (ADJUST) 

The link count for F inode I is X but should be Y. The file name F, 
owner 0, mode M, size S, and modify time T are printed. 


Possible responses to ADJUST prompt are: 

YES Replace link count of inode / with Y. 

NO Ignore this error condition. 

UNREF FILE 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
(CLEAR) 

Inode I, which is a file, was not connected to a directory entry when 
the file system was traversed. The owner O, mode M, size S, and 
modify time T of inode I are printed. If the — n option is not set and 
the file system is not mounted, empty files will be cleared 
automatically. 


Possible responses to CLEAR prompt are: 

YES Deallocate inode I by zeroing its contents. 


A- 13 



APPENDIX 6-1 


NO Ignore this error condition. 

UNREF DIR 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
(CLEAR) 

Inode I, which is a directory, was not connected to a directory entry 
when the file system was traversed. The owner 0, mode M, size S, 
and modify time T of inode I are printed. If the — n option is not set 
and the file system is not mounted, empty directories will be cleared 
automatically. Nonempty directories will not be cleared. 


Possible responses to CLEAR prompt are: 


YES Deallocate inode I by zeroing its contents. 

NO Ignore this error condition. 

BAD/DUP FILE 1=1 0WNER=0 MODE=M SIZE=S 
MTIME=T (CLEAR) 

Phase 1 or Phase lb have found duplicate blocks or bad blocks 
associated with file inode I. The owner 0, mode M, size S, and 
modify time T of inode I are printed. 


Possible responses to CLEAR prompt are: 

YES Deallocate inode I by zeroing its contents. 

NO Ignore this error condition. 

BAD/DUP DIR 1=1 0WNER=0 MODE=M SIZE=S MTIME=T 
(CLEAR) 

Phase 1 or Phase lb have found duplicate blocks or bad blocks 
associated with directory inode I. The owner O, mode M, size S, and 
modify time T of inode I are printed. 


Possible responses to CLEAR prompt are: 

YES 
NO 


A-14 


Deallocate inode I by zeroing its contents. 
Ignore this error condition. 



APPENDIX 6-1 


FREE INODE COUNT WRONG IN SUPERBLK (FIX) 

The actual count of the free inodes does not match the count in the 
superblock of the file system. If the -q option is specified, the count 
will be fixed automatically in the superblock. 


Possible responses to FIX prompt are: 

YES Replace count in superblock by actual count. 

NO Ignore this error condition. 

H. PHASE 5: CHECK FREE LIST 

This phase concerns itself with the free-block list. This part lists 
error conditions resulting from bad blocks in the free-block list, bad 
free-blocks count, duplicate blocks in the free-block list, unused 
blocks from the file system not in the free-block list, and the total 
free-block count incorrect. 

EXCESSIVE BAD BLKS IN FREE LIST (CONTINUE) 

The free-block list contains more than a tolerable number (usually 
10) of blocks with a value less than the first data block in the file 
system or greater than the last block in the file system. 

Possible responses to CONTINUE prompt are: 

YES Ignore rest of the free-block list and continue 

execution of fsck. This error condition will 
always invoke “BAD BLKS IN FREE LIST” 
error condition in Phase 5. 

NO Terminate program. 

EXCESSIVE DUP BLKS IN FREE LIST (CONTINUE) 

The free-block list contains more than a tolerable number (usually 
10) of blocks claimed by inodes or earlier parts of the free-block list. 

Possible responses to CONTINUE prompt are: 

Ignore the rest of the free-block list and 
continue execution of fsck. This error 

A-15 


YES 



APPENDIX 6-1 


condition will always invoke “DUP BLKS IN 
FREE LIST” error condition in Phase 5. 

NO Terminate program. 

BAD FREEBLK COUNT 

The count of free blocks in a free-list block is greater than 50 or less 
than 0. This error condition will always invoke the “BAD FREE 
LIST” condition in Phase 5. 

X BAD BLKS IN FREE LIST 

X blocks in the free-block list have a block number lower than the 
first data block in the file system or greater than the last block in 
the file system. This error condition will always invoke the “BAD 
FREE LIST” condition in Phase 5. 

X DUP BLKS IN FREE LIST 

X blocks claimed by inodes or earlier parts of the free-list block were 
found in the free-block list. This error condition will always invoke 
the “BAD FREE LIST” condition in Phase 5. 

X BLK(S) MISSING 

X blocks unused by the file system were not found in the free-block 
list. This error condition will always invoke the “BAD FREE LIST” 
condition in Phase 5. 

FREE BLK COUNT WRONG IN SUPERBLOCK (FIX) 

The actual count of free blocks does not match the count in the 
superblock of the file system. 


Possible responses to FIX prompt are: 

YES Replace count in superblock by actual count. 

NO Ignore this error condition. 

BAD FREE LIST (SALVAGE) 

Phase 5 has found bad blocks in the free-block list, duplicate blocks 
in the free-block list, or blocks missing from the file system. If the 
-q option is specified, the free-block list will be salvaged 
automatically. 


A-16 



APPENDIX 6-1 


Possible responses to SALVAGE prompt are: 

YES Replace actual free-block list with a new free- 

block list. The new free-block list will be 
ordered to reduce time spent by the disk 
waiting for the disk to rotate into position. 

NO Ignore this error condition. 

I. PHASE 6: SALVAGE FREE LIST 

This phase concerns itself with the free-block list reconstruction. 
This part lists error conditions resulting from the blocks-to-skip and 
blocks-per-cylinder values. 

Default free-block list spacing assumed 

This is an advisory message indicating the blocks-to-skip is greater 
than the blocks-per-cylinder, the blocks-to-skip is less than 1, the 
blocks-per-cylinder is less than 1, or the blocks-per-cylinder is greater 
than 500. The default values of 9 blocks-to-skip and 400 blocks-per- 
cylinder are used. See fsck(lM) in the UNIX System V 
Administrator Reference Manual for further details. 

J. CLEANUP 

Once a file system has been checked, a few cleanup functions are 
performed. This part lists advisory messages about the file system 
and modify status of the file system. 

X files Y blocks Z free 

This is an advisory message indicating that the file system checked 
contained X files using Y blocks leaving Z blocks free in the file 
system. 

***** BOOT UNIX (NO SYNC!) ***** 

This is an advisory message indicating that a mounted file system or 
the root file system has been modified by fsck. If the UNIX system 
is not rebooted immediately without sync, the work done by fsck 
may be undone by the in-core copies of tables the UNIX system 
keeps. 


A-17 



APPENDIX 6-1 


***** FILE SYSTEM WAS MODIFIED ***** 

This is an advisory message indicating that the current file system 
was modified by fsck. 


A-18 



Chapter 7 

LP SPOOLING SYSTEM 


GENERAL 

The line printer (LP) program is a series of commands that perform 
diverse spooling functions under the UNIX operating system. Since 
the primary LP application is off-line printing, this document focuses 
mainly on spooling to line printers. LP allows administrators to 
customize the system to spool to a collection of line printers of any 
type and to group printers into logical classes in order to maximize 
the throughput of the devices. Users are provided the capabilities of: 

• Queuing and canceling print requests 

• Preventing and allowing queuing to devices 

• Starting and stopping LP from processing requests 

• Changing configuration of printers 

• Finding status of the LP system. 


This chapter describes the role of an LP administrator in performing 
restricted functions and overseeing the smooth operation of LP. 


Throughout this chapter, each reference of the form name(lM), 
name(7), or name(8) refers to entries in the UNIX System V 
Administrator Reference Manual. References to entries of the form 
name(N), where " N" is the number 1 or 6 possibly followed by a 
letter, refer to entry name in section N of the UNIX System V User 
Reference Manual. If " N" is a number 2 through 5 possibly followed 
by a letter, refer to entry name in section N of the UNIX System V 
Programmer Reference Manual. 


7-1 



LP SPOOLING 


OVERVIEW OF LP FEATURES 


Definitions 

Several terms must be defined before presenting a brief summary of 
LP commands. The LP was designed with the flexibility to meet the 
needs of users on different UNIX systems. Changes to the LP 
configuration are performed by the lpadmin(lM) command. 


LP makes a distinction between printers and printing devices. A 
device is a physical peripheral device or a file and is represented by a 
full UNIX system pathname. A printer is a logical name that 
represents a device. At different points in time, a printer may be 
associated with different devices. A class is a name given to an 
ordered list of printers. Every class must contain at least one 
printer. Each printer may be a member of zero or more classes. A 
destination is a printer or a class. One destination may be 
designated as the system default destination. The lp(l) command 
will direct all output to this destination unless the user specifies 
otherwise. Output that is routed to a printer will be printed only by 
that printer, whereas output directed to a class will be printed by the 
first available class member. 


Each invocation of lp creates an output request that consists of the 
files to be printed and options from the lp command line. An 
interface program which formats requests must be supplied for each 
printer. The LP scheduler, lpsched(lM), services requests for all 
destinations by routing requests to interface programs to do the 
printing on devices. An LP configuration for a system consists of 
devices, destinations, and interface programs. 


Commands 


Commands for General Use 

The lp(l) command is used to request the printing of files. It creates 
an output request and returns a request id of the form 

dest-seqno 


7-2 



LP SPOOLING 


to the user, where seqno is a unique sequence number across the 
entire LP system and dest is the destination where the request was 
routed. 


Cancel is used to cancel output requests. The user supplies request 
ids as returned by lp or printer names, in which case the currently 
printing requests on those printers are canceled. 


Disable prevents lpsched from routing output requests to printers. 


Enable(l) allows lpsched to route output requests to printers. 


Commands for LP Administrators 

Each LP system must designate a person or persons as LP 
administrator to perform the restricted functions listed below. 
Either the superuser or any user who is logged into the UNIX system 
as lp qualifies as an LP administrator. All LP files and commands 
are owned by lp except for lpadmin and lpsched which are owned 
by root. The following commands will be described in more detail 
later in this chapter. 


lpadmin(lM) 

Modifies LP configuration. Many features of 
this command cannot be used when lpsched is 
running. 

lpsched(lM) 

Routes output requests to interface programs 
which do the printing on devices. 

lpshut 

Stops lpsched from running. All printing 
activity is halted, but other LP commands may 
still be used. 

accept(lM) 

Allows lp to accept output requests for 
destinations. 

reject 

Prevents lp from accepting requests for 
destinations. 

lpmove 

Moves output requests from one destination to 
another. Whole destinations may be moved at 

7-3 



LP SPOOLING 


one time. This command cannot be used when 
Ipsched is running. 


BUILDING LP 

All LP commands are built from source code that resides in the 
/usr/src/cmd/lp directory including the make file, lp.mk. Unless 
some of the definitions in lp.mk are changed, LP may be installed 
only by the superuser. Before installing a new LP system, make sure 
there is a login called lp on your system and that the spool directory, 
/usr/spool/lp, does not exist. To install LP, perform the following: 


cd /usr/src/cmd/lp 
make -f lp.mk install 


This builds all LP commands and creates an initial LP configuration 
consisting of no printers, classes, or default destination. LP must be 
configured by an LP administrator using the lpadmin command in 
order to create a useful spooler. 


In addition, add the following code to /etc/rc. 


rm -f /usr/spool/lp/SCHEDLOCK 

/usr/lib/lpsched 

echo " LP scheduler started" 


This starts the LP scheduler each time that the UNIX system is 
restarted. 


Several variables in lp.mk may be changed before installing LP to 
customize the system: 


Variable Default Value 


Meaning 


SPOOL /usr/spool/lp 
ADMIN lp 
GROUP bin 
ADMDIR /usr/lib 
USRDIR /usr/bin 


spool directory 

logname of LP Administrator 
group owning LP commands/data 
commands of administrator 
user commands reside here 


7-4 



LP SPOOLING 


If an existing LP spool directory is corrupted (but not the LP 
programs) or if it needs to be rebuilt from scratch, make sure that 
lpsched is not running and perform the following as superuser: 

1. Make copies of any interface programs that are not standard LP 
software. DO NOT make these copies underneath the spool 
directory. The pathname for printer " p" is 
/usr/spool/lp/in terf ace/p. 

2. rm -fr /usr/spool/lp 

3. Make -/ lp.mk new. (This recreates the bare LP configuration 
described above.) 

PRECAUTIONS 

1. Some LP commands invoke other LP commands. Moving them 
after they are built will cause some commands to fail. 

2. The files under the SPOOL directory should be modified only by 
LP commands. 

3. All LP commands require set-user-id permission. If this is 
removed, the commands will fail. 

CONFIGURING LP— THE “lpadmin” COMMAND 

Changes to the LP configuration should be made by using the 
lpadmin command and not by hand. Lpadmin will not attempt to 
alter the LP configuration when lpsched is running, except where 
explicitly noted below. 


Introducing New Destinations 

The following information must be supplied to lpadmin when 
introducing a new printer: 

1. The printer name (— p printer) is an arbitrary name which must 
conform to the following rules: 

• It must be no longer than 14 characters. 


7-5 



LP SPOOLING 


• It must consist solely of alphanumeric characters and 
underscores. 

© It must not be the name of an existing LP destination 
(printer or class). 

2. The device associated with the printer (— v device). This is the 
pathname of a hard-wired printer, a login terminal, or other file 
that is writable by lp. 

3. The printer interface program. This may be specified in one of 
three ways: 

o It may be selected from a list of model interfaces supplied 
with LP (— m model). 

® It may be the same interface that an existing printer uses 
(— e printer). 

• It may be a program supplied by the LP administrator (— i 
interface). 


Information which need not always be supplied when creating a new 
printer includes: 

1. The user may specify -h to indicate that the device for the 
printer is hardwired or the device is the name of a file (this is 
assumed by default). If, on the other hand, the device is the 
pathname of a login terminal, then —1 must be included on the 
command line. This indicates to lpsched that it must 
automatically disable this printer each time lpsched starts 
running. This fact is reported by lpstat when it indicates printer 
status: 


$ lpstat -pa 

printer a (login terminal) disabled Oct 31 11:15 - 
disabled by scheduler: login terminal 


This is done because device names for login terminals can be 
(and usually are) associated with different physical devices from 



LP SPOOLING 


day to day. If the scheduler did not take this action, somebody 
might log in and be surprised that LP is spooling to his/her 
terminal! 

2. The new printer may be added to an existing class or added to a 
new class (-cclass). New class names must conform to the same 
rules for new printer names. 

EXAMPLES 

The following examples will be referenced by further examples in 

later sections. 

1. Create a printer called prl whose device is /dev/printer and 
whose interface program is the model hp interface: 

$ /usr/lib/lpadmin -pprl -v/dev/printer -mhp 

2. Add a printer called pr2 whose device is /dev/tty22 and whose 
interface is a variation of the model prx interface. It is also a 
login terminal: 

$ cp /usr/spool/lp/model/prx xxx 
< edit xxx > 

$ /usr/lib/lpadmin -ppr2 -v/dev/tty22 -ixxx -1 

3. Create a printer called pr3 whose device is /dev/tty23. The pr3 
will be added to a new class called ell and will use the same 
interface as printer pr2: 

$ /usr/lib/lpadmin -ppr3 -v/dev/tty23 -epr2 -cell 

Modifying Existing Destinations 

Modifications to existing destinations must always be made with 

respect to a printer name (-pprinter). The modifications may be one 

or more of the following: 

1. The device for the printer may be changed (-vdevice). If this is 
the only modification, then this may be done even while lpsched 
is running. This facilitates changing devices for login terminals. 


7-7 



LP SPOOLING 


2. The printer interface program may be changed (— mmodel, 
— eprinter, — iinterface). 

3. The printer may be specified as hardwired (— h) or as a login 
terminal (— 1 ). 

4. The printer may be added to a new or existing class (— cclass). 

5. The printer may be removed from an existing class (— rclass). 
Removing the last remaining member of a class causes the class 
to be deleted. No destination may be removed if it has pending 
requests. In that case, lpmove or cancel should be used to 
move or delete the pending requests. 

EXAMPLES 

These examples are based on the LP configuration created by those in 

the previous section. 

1. Add printer pr2 to class ell: 


$ /usr/lib/lpadmin -ppr2 -cell 


2. Change pr2’s interface program to the model prx interface, 
change its device to /dev/tty24, and add it to a new class called 
cl2: 


$ /usr/lib/lpadmin -ppr2 -mprx -v/dev/tty24 -ccl2 


Note that printers pr2 and pr3 now use different interface 
programs even though pr3 was originally created with the same 
interface as pr2. Printer pr2 is now a member of two classes. 

3. Specify printer pr2 as a hard-wired printer: 

$ /usr/lib/lpadmin -ppr2 -h 


7-8 



LP SPOOLING 


4. Add printer prl to class cl2: 


$ /usr/lib/lpadmin -pprl -ccl2 


The members of class cl2 are now pr2 and prl, in that order. 
Requests routed to class cl2 will be serviced by pr2 if both pr2 
and prl are ready to print; otherwise, they will be printed by the 
one which is next ready to print. 

5. Remove printers pr2 and pr3 from class ell: 


$ /usr/lib/lpadmin -ppr2 -rcll 
$ /usr/lib/lpadmin -ppr3 -rcll 

Since pr3 was the last remaining member of class ell, the class is 
removed. 

6. Add pr3 to a new class called cl3. 


$ /usr/lib/lpadmin -ppr3 -ccl3 


Specifying the System Default Destination 

The system default destination may be changed even when lpsched 
is running. 

EXAMPLES 

1. Establish class ell as the system default destination: 

$ /usr/lib/lpadmin -dell 

2. Establish no default destination: 


$ /usr/lib/lpadmin -d 



LP SPOOLING 


Removing Destinations 

Classes and printers may be removed only if there are no pending 
requests that were routed to them. Pending requests must either be 
canceled using cancel or moved to other destinations using lpmove 
before destinations may be removed. If the removed destination is 
the system default destination, then the system will have no default 
destination until the default destination is respecified. When the last 
remaining member of a class is removed, then the class is also 
removed. The removal of a class never implies the removal of 
printers. 

EXAMPLES 


1. Make printer prl the system default destination: 

$ /usr/lib/lpadmin -dprl 
Remove printer prl: 

$ /usr/lib/lpadmin -xprl 

Now there is no system default destination. 

2. Remove printer pr2: 

$ /usr/lib/lpadmin -xpr2 

Class cl2 is also removed since pr2 was its only member. 

3. Remove class cl3: 

$ /usr/lib/lpadmin -xcl3 
Class cl3 is removed, but printer pr3 remains. 


7-10 



LP SPOOLING 


MAKING AN OUTPUT REQUEST— THE “Ip” 
COMMAND 

Once LP destinations have been created, users may request output by 
using the Ip command. The request id that is returned may be used 
to see if the request has been printed or to cancel the request. 

The LP program determines the destination of a request by checking 
the following list in order: 


. if the user specifies -d dest on the command line, then the 
request is routed to dest. 

© If the environment variable LPDEST is set, the request is 
routed to the value of LPDEST. 

. If there is a system default destination, then the request is 
routed there. 

© The request is rejected. 

EXAMPLES 

1. There are at least four ways to print the password file on the 
system default destination: 

lp /etc/passwd 
lp < /etc/passwd 
cat /etc/passwd | lp 
lp -c /etc/passwd 

The last three ways cause copies of the file to be printed, 
whereas the first way prints the file directly. Thus, if the file is 
modified between the time the request is made and the time it is 
actually printed, then the changes will be reflected in the output. 

2. Print two copies of file abc on printer xyz and title the output 
“my file”: 

pr abc | lp -dxyz -n2 -t" my file" 


7-11 



LP SPOOLING 


3. Print file xxx on a Diablo* 1640 printer called zoo in 12-pitch and 
write to the user’s terminal when printing has completed: 

lp -dzoo -ol2 -w xxx 

In this example, “12” is an option that is meaningful to the 
model Diablo 1640 interface program that prints output in 12- 
pitch mode [see Ipadmin(lM)]. 


FINDING LP STATUS— LPSTAT 

The lpstat command is used to find status information about LP 
requests, destinations, and the scheduler. 

EXAMPLES 

1. List the status of all pending output requests made by this user: 

lpstat 

The status information for a request includes the request id, the 
logname of the user, the total number of characters to be 
printed, and the date and time the request was made. 

2. List the status of printers pi and p2: 

lpstat -ppl,p2 


* Registered trademark of Xerox Corporation 


7-12 



LP SPOOLING 


CANCELING REQUESTS— CANCEL 

The LP requests may be canceled using the cancel command. Two 
kinds of arguments may be given to the command— request ids and 
printer names. The requests named by the request ids are canceled 
and requests that are currently printing on the named printers are 
canceled. Both types of arguments may be intermixed. 

EXAMPLE 

Cancel the request that is now printing on printer xyz: 


cancel xyz 


If the user that is canceling a request is not the same one that made 
the request, then mail is sent to the owner of the request. LP allows 
any user to cancel requests in order to eliminate the need for users to 
find LP administrators when unusual output should be purged from 
printers. 


ALLOWING AND REFUSING REQUESTS— 
ACCEPT AND REJECT 

When a new destination is created, lp will reject requests that are 
routed to it. When the LP administrator is sure that it is set up 
correctly, he or she should allow lp to accept requests for that 
destination. The accept command performs this function. 


Sometimes it is necessary to prevent lp from routing requests to 
destinations. If printers have been removed or are waiting to be 
repaired or if too many requests are building for printers, then it 
may be desirable to cause lp to reject requests for those destinations. 
The reject command performs this function. After the condition 
that led to the rejection of requests has been remedied, the accept 
command should be used to allow requests to be taken again. 


The acceptance status of destinations is reported by the —a option of 

lpstat. 



LP SPOOLING 


EXAMPLES 

1. Cause Ip to reject requests for destination xyz: 

/usr/lib/reject -r" printer xyz needs repair" xyz 

Any users that try to route requests to xyz will encounter the 
following: 

$ lp -dxyz file 

lp: can not accept requests for destination " xyz" 

— printer xyz needs repair 

2. Allow lp to accept requests routed to destination xyz: 

/usr/lib/accept xyz 


ALLOWING AND INHIBITING PRINTING- 
ENABLE AND DISABLE 

The enable command allows the LP scheduler to print requests on 
printers. That is, the scheduler routes requests only to the interface 
programs of enabled printers. Note that it is possible to enable a 
printer and at the same time prevent further requests from being 
routed to it. 


The disable command will undo the effects of the enable command. 
It prevents the scheduler from routing requests to printers, 
independently of whether or not lp is allowing them to accept 
requests. Printers may be disabled for several reasons including 
malfunctioning hardware, paper jams, and end of day shutdowns. If 
a printer is busy at the time it is disabled, then the request that was 
printing will be reprinted in its entirety either on another printer (if 
the request was originally routed to a class of printers) or on the 
same one when the printer is reenabled. The — c option causes the 
currently printing requests on busy printers to be canceled in 
addition to disabling the printers. This is useful if strange output is 
causing a printer to behave abnormally. 


7-14 



LP SPOOLING 


EXAMPLE 

Disable printer xyz because of a paper jam: 

$ disable -r" paper jam" xyz 
printer " xyz" now disabled 


Find the status of printer xyz: 


$ lpstat -pxyz 

printer " xyz" disabled since Jan 5 10:15 - 
paper jam 


Now, reenable xyz: 

$ enable xyz 

printer " xyz" now enabled 


MOVING REQUESTS BETWEEN 
DESTINATIONS— LPMOVE 

Occasionally, it is useful for LP administrators to move output 
requests between destinations. For instance, when a printer is down 
for repairs, it may be desirable to move all of its pending requests to 
a working printer. This is one way to use the lpmove command. 
The other use of this command is to move specific requests to a 
different destination. Lpmove will refuse to move requests while 
the LP scheduler is running. 

EXAMPLES 


1. Move all requests for printer abc to printer xyz: 


$ /usr/lib/lpmove abc xyz 

All of the moved requests are renamed from abc-nnn to xyz-nnn. 
As a side effect, destination abc is no longer accepting further 
requests. 



LP SPOOLING 


2. Move requests zoo-543 and abc-1200 to printer xyz: 


$ /usr/lib/lpmove zoo-543 abc-1200 xyz 


The two requests are now renamed xyz-543 and xyz-1200. 

STOPPING AND STARTING THE 
SCHEDULER— LPSHUT AND LPSCHED 

Lpsched is the program that routes the output requests that were 
made with lp through the appropriate printer interface programs to 
be printed on line printers. Each time the scheduler routes a request 
to an interface program, it records an entry in the log file, 
/usr/spool/lp/log. This entry contains the logname of the user that 
made the request, the request id, the name of the printer that the 
request is being printed on, and the date and time that printing first 
started. In the case that a request has been restarted, more than one 
entry in the log file may refer to the request. The scheduler also 
records error messages in the log file. When lpsched is started, it 
renames /usr/spool/lp/log to /usr/spool/lp/oldlog and starts a new 
log file. 


No printing will be performed by the LP system unless lpsched is 
running. Use the command 


lpstat -r 


to find the status of the LP scheduler. 


Lpsched is normally started by the /etc/rc program as described 
above and continues to run until the UNIX system is shut down. The 
scheduler operates in the /usr/spool/lp directory. When it starts 
running, it will exit immediately if a file called SCHEDLOCK exists. 
Otherwise, it creates this file in order to prevent more than one 
scheduler from running at the same time. 

Occasionally, it is necessary to shut down the scheduler in order to 
reconfigure LP or to rebuild the LP software. The command 

/usr/lib/lpshut 


7-16 



LP SPOOLING 


causes lpsched to stop running and terminates all printing activity. 
All requests that were in the middle of printing will be reprinted in 
their entirety when the scheduler is restarted. 


To restart the LP scheduler, use the command 


/usr/lib/lpsched 

Shortly after this command is entered, lpstat should report that the 
scheduler is running. If not, it is possible that a previous invocation 
of lpsched exited without removing SCHEDLOCK, so try the 
following: 

rm -f /usr/spool/lp/SCHEDLOCK 
/usr/lib/lpsched 

The scheduler should be running now. 


PRINTER INTERFACE PROGRAMS 

Every LP printer must have an interface program which does the 
actual printing on the device that is currently associated with the 
printer. Interface programs may be shell procedures, C programs, or 
any other executable program. The LP model interfaces are all 
written as shell procedures and can be found in the 
/usr/spool/lp/model directory. At the time lpsched routes an 
output request to a printer P, the interface program for P is invoked 
in the directory /usr/spool/lp as follows: 

interface/P id user title copies options file ... 
where 

id is the request id returned by lp 
user is logname of user who made the request 
title is optional title specified by the user 
copies is number of copies requested by user 
options is a blank-separated list of class or 
printer-dependent options specified by user 
file is the full pathname of a file to be printed 



LP SPOOLING 


EXAMPLES 

The following examples are requests made by user “smith” with a 
system default destination of printer “xyz”. Each example lists an Ip 
command line followed by the corresponding command line generated 
for printer xyz’s interface program: 

1. Ip /etc/passwd /etc/group 

interface/xyz xyz-52 smith " " 1 " " /etc/passwd /etc/group 

2. pr /etc/passwd | lp -t" users" -n5 

interface/xyz xyz-53 smith users 5 " " 

/ usr/ spool/lp/ request/ xyz/ dO-53 

3. lp /etc/passwd -oa -ob 

interface/xyz xyz-54 smith " " 1 " a b" /etc/passwd 

When the interface program is invoked, its standard input comes 
from /dev/null and both the standard output and standard error 
output are directed to the printer’s device. Devices are opened for 
reading as well as writing when file modes permit. In the case where 
a device is a regular file, all output is appended to the end of the file. 


Given the command line arguments and the output directed to a 
device, interface programs may format their output in any way they 
choose. Interface programs must ensure that the proper stty modes 
(terminal characteristics such as baud rate, output options, etc.) are 
in effect on the output device. This may be done in a shell interface 
only if the device is opened for reading: 


stty mode ... <&1 

That is, take the standard input for the stty command from the 
device. 


7-18 



LP SPOOLING 


When printing has completed, it is the responsibility of the interface 
program to exit with a code indicative of the success of the print job. 
Exit codes are interpreted by lpsched as follows: 

CODE MEANING TO LPSCHED 

0 The print job has completed successfully. 

1 to 127 A problem was encountered in printing this 

particular request (e.g., too many 
nonprintable characters). This problem will 
not affect future print jobs. Lpsched 
notifies users by mail that there was an error 
in printing the request. 

greater than 127 These codes are reserved for internal use by 
lpsched. Interface programs must not exit 
with codes in this range. 

When problems that are likely to affect future print jobs occur (e.g., 
a device filter program is missing), the interface programs would be 
wise to disable printers so that print requests are not lost. When a 
busy printer is disabled, the interface program will be terminated 
with signal 15. 


7-19 



LP SPOOLING 


SETTING UP HARD-WIRED DEVICES AND 
LOGIN TERMINALS AS LP PRINTERS 

Hard-wired Devices 

As an example of how to set up a hard-wired device for use as an LP 
printer, consider using tty line 15 as printer xyz. As superuser, 
perform the following: 

1. Avoid unwanted output from non-LP processes and ensure that 
LP can write to the device: 


$ chown lp /dev/ttyl5 
$ chmod 600 /dev/ttyl5 


2. Change /etc/inittab so that ttyl5 is not a login terminal. In 
other words, ensure that /etc/getty is not trying to log users in 
at this terminal. Change the entries for ttyl5 to: 

15:2:off:/etc/getty -t60 ttyl5 1200 


Enter the command: 


$ telinit Q 


If there is currently an invocation of /etc/getty running on ttyl5, 
kill it. When the UNIX system is rebooted, ttyl5 will be 
initialized with default stty modes. Thus, it is up to LP interface 
programs to establish the proper baud rate and other stty modes 
for correct printing to occur. 

3. Introduce printer xyz to LP using the model prx interface 
program: 


$ /usr/lib/lpadmin -pxyz -v/dev/ttyl5 -mprx 


7-20 



LP SPOOLING 


4. When xyz is created, it will initially be disabled and lp will be 
rejecting requests routed to it. If it is desired, allow lp to accept 
requests for xyz: 

/usr/lib/accept xyz 

This will allow requests to build up for xyz and to be printed 
when it is enabled at a later time. 

5. When it is desired for printing to occur, be sure that the printer 
is ready to receive output. For several printers, this means that 
the top of form has been adjusted and that the printer is on-line. 
Enable printing to occur on xyz: 

enable xyz 

When requests have been routed to xyz, they will begin printing. 

Login Terminals 

Login terminals may also be used as LP printers. To do this for a 
Diablo 1640 terminal called abc, perform the following: 

1. Introduce printer abc to LP using the model 1640 interface 
program: 


$ /usr/lib/lpadmin -pabc -v/dev/null -ml640 -1 


Note that /dev/null is used as abc’s device because we will 
specify the actual device each time that abc is enabled. This 
device may be different from day to day. When abc is created, it 
will initially be disabled; and lp will be rejecting requests routed 
to it. If it is desired, allow lp to accept requests for abc: 

/usr/lib/accept abc 

This will allow requests to build up for abc and to be printed 
when it is enabled at a later time. It is not advisable to enable 
abc for printing, however, until the following steps have been 
taken. 



LP SPOOLING 


2. Log terminal in if this has not already been done. 

3. Assuming the tty(l) command reports that this terminal is 
/dev/tty02, associate this device with printer abc: 

$ /usr/lib/lpadmin -pabc -v/dev/tty02 


Note that lpadmin may be used only by an LP administrator. If 
it is desired for other users to routinely perform this step, then 
an LPA may establish a program owned by lp or by root with 
set-user-id permission that performs this function. 

4. When it is desired for printing to occur, be sure that the printer 
is ready to receive output. For several printers, this means that 
the top of form has been adjusted. Enable printing to occur on 
abc: 


enable abc 


When requests have been routed to abc, they will begin printing. 

5. When all printing has stopped on abc or when you want it back 
as a regular login terminal, you may prevent it from printing 
more output: 


$ disable abc 

printer " abc" now disabled 


If abc is enabled when the UNIX system is rebooted or when 
lpsched is restarted, it will be disabled automatically. 


7-22 



LP SPOOLING 


SUMMARY 

The administrative functions of the LP administrator have been 
described in detail. These functions include configuring and 
reconfiguring LP; maintaining printer interface programs; accepting, 
rejecting, and moving print requests; stopping and starting the LP 
scheduler; and enabling and disabling printers. LP offers 
administrators the following advantages over other centrally 
supported printer packages: 


• Printers may be grouped into classes. 

® LP may be configured to meet the needs of each site. 

• Administrators may supply interface programs to format 
output in any way desirable. 

• LP functions are performed by simple commands and not by 
hand. 



LP SPOOLING 


NOTES 


7-24 



Chapter 8 

UNIX SYSTEM REMOTE JOB ENTRY 


GENERAL 

This chapter contains information on the design and operation of the 
UNIX System Remote Job Entry (RJE). In this document, RJE 
refers to the facilities provided by UNIX operating system and not to 
the remote job entry feature of the HASP and JES2 subsystems 
produced by International Business Machines (IBM). 


The information contained in this chapter should be used to augment 
the information contained in the UNIX System V Administrator 
Reference Manual [rje(8)]. There will be assumptions made 
concerning allocation of responsibilities between UNIX system and 
IBM operations, hardware configuration, etc. Although these 
assumptions may not fully apply to your location, they should not 
interfere with the intent of this document. 


The major topics discussed in this document are as follows: 


• SETTING UP— Hardware requirements and RJE generation on 
the IBM and UNIX systems. 

• DIRECTORY STRUCTURES — The controlling RJE directory 
structure and a typical RJE subsystem directory structure. 

• RJE PROGRAMS— Programs that make up an RJE subsystem. 

• UTILITY PROGRAMS— Programs available for debugging or 
tracing. 

• RJE ACCOUNTING— The accounting of jobs done by RJE and 
some methods for using this accounting data. 

• TROUBLESHOOTING — Error recovery and procedures for 
identifying and fixing RJE problems. 



REMOTE JOB ENTRY 


Facilities 

Discussions will focus on a hypothetical RJE connection between a 
UNIX system, whose nodename is pwba, and an IBM 370/168, 
referred to as B. We also assume that pwba is connected to an IBM 
370/158, referred to as C. The UNIX operating system machine 
emulates an IBM System/360 remote multileaving work station. 


SETTING UP 


Hardware 

In the remainder of this guide, the hardware described below will be 
referred to as the physical device ; and its name will be referred to as 
device?, where ? is the device number. 

On DEC computers, RJE requires the use of a KMC11-B 
microprocessor to control either a single-line interface or eight-line 
interface. For KMC11-B control of a single RJE line, the following 
hardware is required: 

• KMC11-B Microprocessor— used to drive the RJE line. 

• DMC11-DA or DMC11-FA line unit— the DMC11-DA interfaces 
with AT&T 208 and 209 synchronous modems or equivalent. 
The DMC11-FA interfaces with AT&T 500 A LI/5 synchronous 
modems or equivalent. 

Each KMC/DMC pair supports a single RJE line that may operate at 
speeds up to 56 KB. On the KMC11 line unit, the NO CRC switch 
(switch S2 in switch pack number 1) should be in the ON position. 


For KMC11-B control of from one to eight RJE lines, the following 
hardware is required: 

• KMC11-B Microprocessor— used to drive the RJE line. 

• DMS11-DA Eight-line synchronous communication 
multiplexor. 


8-2 



REMOTE JOB ENTRY 


• DM11-BA Modem control multiplexor. 


These three devices are collectively known as a KMS11. A KMS11 
supports up to eight low-speed (9.6KB or lower) RJE lines or up to 
four intermediate-speed (19.2KB or lower) lines. 

If a KMC/DMC pair is used for RJE, then the KMC11-B must be 
configured on the host system. If a KMS11 is used, both the KMCll- 
B and the DM11-BA must be configured on the host system. The use 
of a KMS11 requires that the dmkset(lM) be invoked (typically in 
/etc/brc) before loading the RJE protocol script into the KMC11-B. 


IBM Generation 

The following applies to the host IBM system. The remote line to the 
UNIX operating system machine should be described as a System/360 
remote work station. The following parameters must be initialized 
and must agree with their counterparts on the UNIX operating 
system machine: 


• Number of printers (NUMPR)— The number of logical printers 
(up to seven). 

• Number of punches (NUMPU)— The number of logical punches 
(up to seven) 

• Number of readers (NUMRD)— The number of logical readers 
(up to seven). 



REMOTE JOB ENTRY 


The JES2 parameters for the hypothetical connection to IBM system 
B are as follows: 

RMT5 S/360, LINE=5, CONSOLE, MULTI,TRANSP,NUMPR=5, 
NUMPU=l,NUMRD=5,ROUTECDE=5 
R5.PR1 PRWIDTH=132 
R5.PR2 PRWIDTH=132 
R5.PR3 PRWIDTH=132 
R5.PR4 PRWIDTH=132 
R5.PR5 PRWIDTH=132 
R5.PU1 NOSUSPND 
R5.RD1 PRIOINC=0,PRIOLIM=14 
R5.RD2 PRIOINC=0,PRIOLIM=14 
R5.RD3 PRIOINC=0,PRIOLIM=14 
R5.RD4 PRIOINC=0,PRIOLIM=14 
R5.RD5 PRIOINC=0,PRIOLIM=14 


System pwba is referenced by line 5 (LINE=5), remote 5 (RMT5). It 
is defined as having a console for the rjestat(lC) command, five 
printers, one punch, and five readers. Although you may have up to 
seven printers or punches, the total number of printers and punches 
may not exceed eight. The line is described as a transparent 
(TRANSP), multileaving (MULTI) line. The remaining information 
describes attributes of the printers, punches, and readers. 


Normally, separator pages are transmitted with IBM print files. The 
UNIX system RJE does not remove separator pages. To prevent 
transmission of separator pages on printer 1 of the previous example, 
its attributes would be: 


R5.PR1 PRWIDTH=132,NOSEP 


NOSEP should be included for all printers when separator pages are 
not desired. Most IBM systems can also be told via a console 
command to cancel transmission of separator pages on printers. This 
can be done from the IBM system console or from the remote UNIX 
operating system machine via rjestat. For example, the following 
JES2 command would cancel separator page transmission on printer 
1: 


$TR5.PR1,S=N 


8-4 



REMOTE JOB ENTRY 


UNIX System Generation 

If the RJE remote dialing facility is to be used, the administrator 
must make sure that the definition for the RJECU in the file 
/usr/include/rje.h is the device to be used for remote dialing. By 
convention, RJECU is defined to be /dev/dn2 for DEC processors. To 
compile and install RJE, the normal make(l) procedures are used 
(see the " Setting up the UNIX System" chapter of this guide). Once 
an RJE subsystem has been installed, the remote line must be 
described in the configuration file /usr/rj e/lines. This file as it 
exists on the hypothetical system pwba is as follows: 


B pwba /usr/rj el rjel vpmO 5:5:1 1200:512:y 
C pwba /usr/rje2 rje2 vpml 1:1:1 1200:512 

The /usr/rj e/lines is accessed by all components of RJE. Each line of 
the table (maximum of eight) defines an RJE connection. Its seven 
columns may be labeled host, system, directory, prefix, device, 
peripherals, and parameters. These columns are described as 
follows: 


« host— The IBM system name, e.g., A, B, C. This string can be 
up to six characters long. 

• system— The UNIX system nodename [see uname(l)]. 

® directory— The directory name of the servicing RJE subsystem 
(e.g., /usr/rje2). 

• prefix— The string prepended to most files and programs in the 
directory (i.e., rje2). 

® device— The name of the controlling virtual protocol machine 
(VPM) device, with /dev/ excised. In order to specify a VPM 
device, all VPM software must be installed, and the proper 
special files must be made [see vpm(7) and mknod(lM)]. 
Also, the permission modes of the VPM device must be set by 
the system administrator to allow read and write access by the 
RJE software. 

• peripherals — Information on the logical devices (readers, 
printers, punches) used by RJE. There are three subfields. 



REMOTE JOB ENTRY 


Each subfield is separated by and is described as follows: 

1. Number of logical readers. 

2. Number of logical printers. 

3. Number of logical punches. 


Note: The number of peripherals specified for an RJE 
subsystem must agree with the number of peripherals 
that have been described on the remote machine for 
that line. 


• parameters— This field contains information on the type of 
connection to make. Each subfield is separated by Any or 
all fields may be omitted; however, the fields are positional. 
All but trailing delimiters must be present. For example, in: 


1200:512:::9-555-1212:400 


subfields 3 and 4 are missing. Each subfield is defined as 
follows: 

1. space— This subfield specifies the amount of space ( S) in 
blocks that RJE tries to maintain on file systems it 
touches. The default is 0 blocks. Send(lC) will not 
submit jobs and rjeinit issues a warning when less than 
1.5S blocks are available; rjerecv stops accepting output 
from the host when the capacity falls to S blocks; RJE 
becomes dormant until conditions improve. If the space 
on the file system specified by the user on the “usr=” 
card would be depleted to a point below S, the file will 
be put in the job subdirectory of the connection’s home 
directory rather than in the place that the user 
requested. 

2. size — This subfield specifies the size in blocks of the 
largest file that can be accepted from the host without 
truncation taking place. The default is no truncation. 



REMOTE JOB ENTRY 


Note the UNIX system has a default 1 megabyte file size 
limit. 

3. badjobs — This subfield specifies what to do with 
undeliverable returning jobs. If an output file is 
undeliverable for any reason other than file system 
space limitations (e.g., missing or invalid “usr=” card) 
and this subfield contains the letter y, the output will be 
retained in the job subdirectory of the home directory; 
and login rje is notified via mail(l). If this subfield has 
any other value, undeliverable output will be discarded. 
The default is “n”. 

4. console— This subfield specifies the status of the 
interactive status terminal for this line. If the subfield 
contains an i, the status console facilities of rjestat will 
be inhibited. In all cases, the normal noninteractive 
uses of rjestat will continue to function. The default is 

“y”. 

5. dial-up — This subfield contains a telephone number to be 
used to call a host machine. The telephone number may 
contain the digits 0 through 9 and the character 
which denotes a pause. If the telephone number is not 
present, no dialing is attempted; and a leased line is 
assumed. 

6. transmission block size — This subfield specifies the size 
(in bytes) of transmission blocks to be sent to the IBM 
host for a particular RJE subsystem. The maximum 
permitted block size is 512. The default value is also 512. 


When multiple readers have been specified, jobs that are submitted 
for transmission to IBM are assigned to the reader with the fewest 
cards on it. Each reader gets an equal amount of service. This 
prevents smaller jobs from having to wait for a previously submitted 
large job to be transmitted. When multiple printers or punches have 
been specified, returning jobs get assigned to free printers (or 
punches) allowing smaller output files to bypass large output files. 


Deciding ho# many peripherals to specify depends on the use of that 
RJE subsystem. If an RJE subsystem is heavily used for off-line 



REMOTE JOB ENTRY 


printing (i.e., output does not return to the UNIX operating system 
machine), the administrator would want to specify multiple readers 
but would not have a need for multiple printers or punches. 


DIRECTORY STRUCTURES 


Controlling Directory 

The controlling directory used by RJE is /usr/rje. This directory 
contains RJE programs for use by separate RJE subsystems (e.g., 
rjel, rje2, rje3) and the shell queuer’s directory. Most RJE programs 
existing here have been compiled such that each RJE subsystem 
shares the text of these programs. A snapshot of this directory on 
our hypothetical machine is as follows: 


-rwxr-xr-x 

3 

rje 

rje 

4068 

Mar 

4 

10:42 

cvt 

-rw-r— r-- 

1 

rje 

rje 

42 

Apr 

10 

09:52 

lines 

-rwxr-xr-x 

3 

rje 

rje 

15096 

Apr 

10 

13:01 

r jedisp 

-rwxr-xr-x 

3 

rje 

rje 

2328 

Mar 

4 

10:21 

r jehalt 

-rwxr-xr-x 

3 

rje 

rje 

10396 

Apr 

15 

10:07 

r jeinit 

-r-x 

3 

rje 

rje 

785 

Apr 

8 

09:00 

r jeload 

-rwsr-xr-x 

3 

rje 

rje 

5040 

Mar 

27 

09:28 

r jeqer 

-rwxr-xr-x 

3 

rje 

rje 

4072 

Apr 

1 

15:40 

r jerecv 

-rwxr-xr-x 

3 

rje 

rje 

3888 

Mar 

27 

09 : 35 

r jexmit 

-rwsr-xr-x 

1 

root 

rje 

2696 

Mar 

27 

14:42 

shqer 

-rwxr-xr-x 

3 

rje 

rje 

5920 

Apr 

2 

15:47 

snoop 

drwxr-xr-x 

2 

rje 

rje 

80 

Mar 

25 

13:26 

sque 


The RJE subsystems are generated in their own directory by linking 
the program names in this directory to the appropriate names in the 
subsystem directory. The programs are described in the part “RJE 
PROGRAMS”. The file lines is the configuration file used by all RJE 
subsystems. The directory sque is used by the shell queuer (shqer). 
This directory contains: 


-rw-r — r — 1 rje rje 0 Feb 14 14:04 errors 

-rw-r— r— 1 rje rje 0 Feb 14 14:04 log 


When shqer has work to do, the files log and errors will be of 
nonzero length; and temporary files (imp*) will also appear here. 


8-8 



REMOTE JOB ENTRY 


Subsystem Directory 

The RJE subsystem described in this part maintains the connection 
between pwba and IBM B and will be referred to as rjel. The first 
line of /usr/rj e/lines describes rjel. As noted in this file, rjel runs 
in the directory /usr/rjel . A snapshot of this directory is as follows: 


-rw-r — r — 

1 

r je 

r je 

4990 

Apr 

1 5 

08 : 30 

acctlog 

-rwxr-xr-x 

3 

r je 

r je 

4068 

Mar 

4 

10:42 

cvt 

-rw-r — r — 

1 

r je 

r je 

0 

Apr 

1 5 

04 : 02 

errlog 

drwxrwxrwx 

2 

r je 

r je 

192 

Apr 

1 0 

09:51 

j ob 

-rw-r— r— 

1 

r je 

r je 

194 

Apr 

1 5 

08:11 

joblog 

-rw-r— r— 

1 

r je 

r je 

0 

Apr 

1 5 

08:11 

resp 

-rwxr-xr-x 

3 

r je 

r je 

15096 

Apr 

10 

13:01 

r je Idisp 

-rwxr-xr-x 

3 

r je 

r je 

2328 

Mar 

4 

10:21 

r je 1 halt 

-rwxr-xr-x 

3 

r je 

r je 

10396 

Apr 

15 

10:07 

r je 1 ini t 

-r-x 

3 

r je 

r je 

785 

Apr 

8 

09:00 

rjel load 

-rwsr-xr-x 

3 

r je 

r je 

5040 

Mar 

27 

09:28 

rjel qer 

-rwxr-xr-x 

3 

r je 

r je 

4072 

Apr 

1 

15:40 

rjel recv 

-rwxr-xr-x 

3 

r je 

r je 

3888 

Mar 

27 

09 : 35 

rjel xmit 

drwxr-xr-x 

2 

r je 

r je 

144 

Apr 

15 

08 : 30 

rpool 

i 

i 

i 

i 

i 

M 

1 

1 

1 

r je 

r je 

14 

Mar 

4 

10:21 

s ignon 

-rwxr-xr-x 

3 

r je 

r je 

5920 

Apr 

2 

15:47 

snoopO 

drwxrwxrwx 

2 

r je 

r je 

176 

Apr 

1 0 

13:03 

spool 

drwxr-xr-x 

2 

r je 

r je 

224 

Apr 

10 

13:56 

squeue 

-rw-r— r— 

1 

r je 

r je 

0 

Apr 

1 5 

10:30 

stop 

-rw-r— r — 

1 

r je 

r je 

274 

Mar 

7 

20 : 25 

test job 


The programs rjel*, cvt, and snoopO are linked to the 
corresponding programs in /usr/rje. The remaining files and their 
uses are as follows: 


• acctlog— Accounting data is stored in this file if it exists. This 
file is the responsibility of the RJE administrator. 

« errlog— Used by rjel to log errors. It can be useful for 
debugging rjel problems. 

o joblog— Used by rjelqer and rjestat to notify rjelxmit that 
a job (or console request) has been submitted. It also contains 
the process-group number of the rjel processes. The program 
cvt can be used to convert this file to a readable form. 

® resp— Contains console messages received from IBM B. These 
messages can be responses for rjestat or IBM responses to 


8-9 



REMOTE JOB ENTRY 


submitted jobs (i.e., on reader messages). This file is truncated 
if it grows to a size greater than 70,000 bytes. 

« signon — A file that must be created by the system 
administrator and that should contain a character sequence of 
the form: 

/*SIGNON XXXXX 


The X’s should be replaced by the signon identification string 
(obtainable from the IBM host’s system administrator) that 
identifies this RJE subsystem to the IBM host system. 

• stop — Indicates that rjelhalt has been executed. The 
existence of this file indicates to rjestat that rjel has been 
halted by the operator. 

• testjob— A sample job that can be submitted to test the rjel 
subsystem. Originally, the job control statements may have to 
be changed to suit your IBM system. 

When rjel terminates abnormally, the file dead should appear in this 
directory. This file contains a short message indicating why rjel is 
not operating and is used by rjestat to report the problem. The 
remaining directories and their uses are as follows: 


• job— Used to save undeliverable jobs if the proper parameter 
has been specified in /usr/rj e/lines. The sample job described 
above is also delivered to this directory. This directory should 
be mode 777. 

• rpool— Contains temporary files used to gather output from 
the remote machine. These files are named pr* (for print 
output files) and pu* (for punch output files). Once a complete 
file has been received, the file is dispatched in the proper way 

by rjeldisp. 

• spoof— Used by send to store temporary files to be submitted 
to the remote machine. This directory must be mode 777. 


8-10 



REMOTE JOB ENTRY 


o squeue — Used by rjel to store submitted files until they are 
transmitted. The program rjelqer is used by send to move 
the temporary files in the spool directory to this directory. 


RJE PROGRAMS 

All programs described below, with the exception of rjestat, exist in 
/usr/rje. These programs are “shared text” and are linked (except 
shqer) to the proper names in each subsystem directory. The names 
described below are generic; the programs in the rje2 directory would 

be rje2qer, rje2init, etc. 


Each available RJE subsystem occupies three process slots. The slots 
used are rje?xmit for the transmitter, rje?recv for the receiver, and 
rje?disp for the dispatcher. One additional process slot is used for 
shqer regardless of how many subsystems are available. 


Each RJE subsystem tries to be self-sustaining and logs any errors 
encountered during normal operation in its errlog file. 


Rjeqer 

This program is used by send to queue files for transmission. When 
invoked, it performs the following steps: 

1. Moves temporary pnch( 4) format file in spool directory to squeue 
directory. 

2. Writes an entry at end of file joblog containing: 

• name of file to be transmitted 

• submitter’s user ID 

• number of card images in file 

• message level for this job. 

The file joblog is used to notify rjexmit of work to be done. 



REMOTE JOB ENTRY 


3. Notifies user that file has been queued. 

Send determines the host system desired and invokes the proper 
rje?qer by getting the prefix from the lines file (e.g., if sending to 
IBM C from our machine, rje2qer would be invoked). 


Rjeload 

This program is used to start an RJE subsystem. Its prefix 
determines the subsystem to start (e.g., rje21oad starts rje2). The 
following paragraphs explain which commands should be executed in 
/etc/rc when changing to init state 2 (multiuser). 


Rjeload requires only one argument (device specification) if an RJE 
subsystem utilizes a KMC/DMC hardware configuration. Rjeload 
requires a second argument, and optionally a third argument, if an 
RJE subsystem utilizes a KMS11. The second argument (line number 
specification) must be supplied to indicate which of the eight DMS11 
line interfaces is to be used by the RJE subsystem for communication 
with the IBM host. Valid line numbers are 0 through 7. The third 
argument (’downloadkms’) must be supplied when rjeload is used 
to start the first RJE subsystem that utilizes a particular KMS11 
(this is because a single down-load of the RJE protocol script permits 
the operation of all eight lines controlled by the KMS11). If a 
hypothetical DEC machine has four RJE subsystems, the first using 
a KMC/DMC and the remaining three using a KMS11, then the 
following commands would be used: 


rm -f /usr/rje/sque/log 

su rje -c " /usr/rjel/rjelload deviceO" 

su rje -c " /usr/rje2/rje21oad devicel 0 downloadkms" 

su rje -c " /usr/rje3/rje31oad devicel 1" 

su rje -c " /usr/rje4/rje41oad devicel 2" 


The file /usr/rje/sque/log is removed to ensure the correct operation 
of shqer. When invoked, rjeload performs the following steps: 

1. Uses VPM device from /usr/rj e/lines to link the proper devices 
[see vpmset(lC)]. 


8-12 



REMOTE JOB ENTRY 


2. Loads device given as argument with RJE protocol script. 

3. Executes rje?init to start rje? processes (e.g., rje21oad 
executes rje2init). 


Rjehalt 

This program is used to halt an RJE subsystem. To halt rje2 on 
UNIX operating system machines, /usr/rje2/rje2halt is executed. 
This should be done in the shutdown procedure for your machine to 
ensure graceful termination of RJE. Rjehalt will allow only those 
users with permission to halt an RJE subsystem. Rjehalt uses the 
header on the file joblog to get the process-group of the RJE 
subsystem processes. This group is signaled to terminate. When all 
processes have terminated, rjehalt sends a “signoff” record to the 
host machine. This signoff record is taken from the file signoff 
(ASCII text) if it exists; otherwise, a “Asignoff” record is sent. On 
completion, rjehalt creates the file stop in the subsystem directory. 
The presence of the file stop in a subsystem directory causes rjestat 
to report to users that RJE to the corresponding host has been 
stopped by the operator. 


Rjeinit 

This program initializes an RJE subsystem. It is used by rjeload 
and can be used to restart a subsystem if the VPM script has 
previously been started. Rjeinit should only be executed by user rje. 
Rjeinit fails if there are less than 100 blocks or 10 inodes free in the 
file system. It issues a warning if there are less than 1.5X blocks 
(where X is the first field in the parameters for that line) or 100 
inodes free in the file system. If rjeinit fails, the reason for the 
failure is reported; and the file dead is created containing “Init 
failed”. This will be reported by rjestat until a subsequent rjeinit 
succeeds. The rjeinit performs the following functions: 

1. Dials a remote host if specified. 

2. Truncates console response file resp. 

3. Sends a signon record to the host. The signon record is taken 
from file signon (ASCII text) if it exists; otherwise, rjeinit sends 
a blank record as a signon. 


8-13 



REMOTE JOB ENTRY 


4. Sets up pipes for process communication. 

5. Resets process-group for RJE subsystem and restarts error 
logging. 

6. Rebuilds joblog file from jobs queued for transmission. 

7. Notifies rjedisp (via a pipe) of any returned files still remaining 
in rpool directory. 

8. Starts appropriate background processes rje?xmit, rje?recv, 
and rje?disp. 

9. Reports started or not started. 

If failure occurs in a background process, it is reported by that 
process (error logging). The failing process will normally attempt to 
reboot the subsystem by executing rje?init with a + as its argument. 
When rjeinit is executed with + as its argument, this indicates an 
attempted reboot; and rjeinit will behave differently. 

Rjexmit 

This program writes data to the VPM device. The rjexmit process is 
started by rjeinit and runs in the background. When running, 
rjexmit performs the following processes: 

1. Checks joblog file for files to be transmitted. This is done every 
5 seconds when not transmitting data. When transmitting data, 
the joblog is checked after transmitting one block from each 
active reader and console. 

( Reader refers to the logical readers used by RJE. Console refers 
to the RJE logical console which is separate from the logical 
readers.) 

2. Queues files from joblog according to first two characters of the 
file name: 


• rd *— These files are queued on the reader with the fewest 
cards. Normal use of the send command creates these 
files. 


8-14 



REMOTE JOB ENTRY 


• sq * — These files are queued on the last available reader to 
assure sequential transmission. Using the — x option to 
the send command creates these files. 

• co *— These files are queued on the console. The rjestat 
command creates these files. 

All files described above contain expanded binary coded decimal 
interface code (EBCDIC) data. 

3. Sends information to rjedisp (via a pipe) for use in user 
notification of job status. 

4. Builds blocks for transmission from active readers and the 
console. These blocks are built according to the multileaving 
protocol. 

5. Performs following peripheral control: 


• Sends requests to open readers when jobs have been 
assigned to them. These readers are not active until a 
grant is received from rjerecv (via a pipe). 

• Halts and activates readers when waits or starts 
(respectively) are received from rjerecv. 

• Sends printer or punch grants when an open request is 
received from rjerecv. 

6. Notifies rjedisp that a file has been transmitted and unlinks the 
file. 

If rjexmit encounters fatal errors, it creates the dead file with an 
appropriate message and signals the other background processes to 
exit. If possible, rjexmit will attempt to reboot the RJE subsystem 
by executing rjeinit. 



REMOTE JOB ENTRY 


Rj erecv 

This program reads data from the VPM device. The rjerecv is 
started by rjeinit and runs in the background. When running, 
rjerecv performs the following processes: 

1. Reads blocks of data received from host system. 

2. Handles data received according to its type. The two types of 
data are: 

• Control information— rjerecv performs the following 
peripheral device control: 


a. Notifies rjexmit of grants to its requests to open 
readers. 

b. Passes wait and start reader information to rjexmit. 

c. Passes open requests (for printers and punches) from 
the host to rjexmit. 

. User Information— The three major types of user 

information received are: 

a. Console responses and job status messages. This data 
is appended to the resp file for use by rjestat and 
rjedisp. 

b. The printer output from user jobs. This data is 
collected in temporary files ( pr *) in the rpool directory. 
When a complete print job has been received, rjerecv 
notifies rjedisp (via a pipe) that the file is to be 
dispatched. 

c. The punch output from user jobs. This data is handled 
the same as printer output except that the rpool files 
are named pu*. 

3. If console response file resp exceeds 70,000 characters, rjerecv 
truncates file. 


8-16 



REMOTE JOB ENTRY 


4. Rjerecv stops accepting output from the remote machine if the 
number of free blocks in the file system falls below space blocks. 

5. Rjerecv truncates files to size blocks if a received file exceeds 
this value. 

If rjerecv encounters fatal errors, it creates the dead file with an 
appropriate error message, signals the other background processes to 
exit, and reboots the RJE subsystem. 


Rjedisp 

This program dispatches user information. Rjedisp is started by 
rjeinit and runs in the background. When running, rjedisp 
performs the following processing: 

1. Dispatches output. The two types of output are printer and 
punch output. After receiving notification of output ready from 
rjerecv, rjedisp searches for a “usr=” line in the received file. 
The format of a “usr=” line is as follows: 

usr=(user, place, level) 


Rjedisp dispatches output according to the place field. 

2. Dispatches messages. The two types of messages are: 


o Job transmitted— This message is sent to the submitting 
user when rjedisp reads this event notice from the 
rjexmit pipe. 

® Output processing— rjedisp dispatches job output 
messages according to the options specified on the “usr=” 
card. A normal output message indicates the returned file 
name is ready. 

Messages can be masked by using the level on the “usr=” card. 

3. Whenever output is to be handled by shqer, rjedisp checks that 
shqer is running. This is done by looking for the shqer log file. 
If this file does not exist, rjedisp starts shqer. 


8-17 



REMOTE JOB ENTRY 


Shqer 

This program executes user programs when they appear in the place 
field of the “usr=” line in a returned output file (print or punch). 
Shqer is started by rjedisp when the first output file using this 
feature is returned. Subsequent files using this feature are logged 
for execution by rjedisp. When started, shqer performs the 
following processing: 

1. Builds the log file from file names in /usr/rje/sque directory. 
Each log entry is the name of a file ( tmp ?) that contains the 
following information: 

• Name of file to be executed 

• Name of input file (file returned from IBM) 

• Name of IBM job 

• Programmer’s name 

• IBM job number 

• User’s name from “usr=” line 

• User’s login directory 

• Minimum file system space. 

2. Shqer uses two parameters. The first is the delay time between 
log file reads. The second is a nice( 2) factor which is applied to 
any programs spawned by shqer. These values are defined in 
/usr/include/rje.h (QDELA Y and QNICE). 

3. When each log entry is read, the appropriate program is spawned 
with the following characteristics: 

• The returned RJE file is standard input to the program. 


8-18 



REMOTE JOB ENTRY 


• The standard and diagnostic outputs are /dev/null. 

• The LOGNAME and HOME variables are set to 
appropriate values. 

« The TZ and PATH variables are set to the following 
default values: 

TZ=EST5EDT 

PATH=/bin:/usr/bin 


If different values are desired for these variables, then the 
desired values should be set using keyword parameters at 
the time that the RJE subsystem is started. For example: 

PATH=:/bin su rje -c " /usr/rjel/rjelload deviceO" 


will start the rjel subsystem and the PATH variable 
passed to programs invoked by shqer will be sent to 
PATH=:/bin. 

• The arguments to the spawned program, in order, are: 


a. A numerical value indicating that file system free 
space is equal or above (0) or below (1) space blocks. 

b. IBM job name. 

c. Programmer’s name. 

d. IBM job number. 

e. User’s login name. 

4. After executing each program, the tmp? file and returned RJE 
file are removed. 


8-19 



REMOTE JOB ENTRY 


UTILITY PROGRAMS 


Snoop 

Snoop is the generic name of a program that can be used to trace 
the state of a VPM device and its associated communications line. 
Snoop depends on the trace(7) driver for its information. It reads 
trace entries from /dev/trace and converts them into a readable form 
that is printed on the standard output. 

The usable name of snoop for a particular RJE subsystem is 
snoopN, where N is the minor device number of the VPM device. In 
our hypothetical system, vpmO is used by the rjel subsystem; and 
vpml is used by the rje2 subsystem. Therefore, /usr/rjel/snoopO and 
/usr/rje2/snoopl are linked to /usr/rj e/snoop. 


Each snoop prints trace entries for its associated VPM device. Trace 
entries are printed in the following form: 

sequence type information 


where: 


« sequence specifies the order of trace occurrences. It is a value 
between 0 and 99. 

o type specifies the action being traced (e.g., transfers, driver 
activity). 

• information describes data being transferred and driver 
activity. 


Refer to Figure 8-1 at the end of this chapter for the meaning of the 
trace types and associated information. 


8-20 



REMOTE JOB ENTRY 


Rjestat 

This program is supplied as a user command. The program’s three 
functions are to list all status messages received from the IBM host 
system pertaining to a particular job, to describe the status of the 
RJE subsystems, and to provide a remote IBM status console. The 
remainder of this part describes these three functions. 


Job Status 

When invoked 


rjestat — jhost jobname 


scans the " resp" file and outputs to the user all IBM status messages 
pertaining to the job " jobname" . 


RJE Status 

When invoked, rjestat reports the status of the RJE subsystems. If 
remote system (“host”) names are specified, only those statuses are 
reported. The rjestat uses the following rules to report the status of 
a subsystem: 


• Rjestat prints the contents of the file status if it exists in the- 
subsystem directory. This file can contain any message the 
administrator wishes to have printed when users use rjestat. 

® If the file dead exists in the subsystem’s directory, the 
subsystem is not operating and the reason is contained in the 
file. The rjestat reports that RJE to “host” is down and 
prints the contents of the dead file as the reason. 

• If the file stop exists in the subsystems directory, the rjehalt 
program has been used to inhibit that RJE subsystem. 
Rjestat reports that RJE to “host” has been stopped by the 
operator. 

< If neither the dead nor the stop file exists, rjestat reports 
that RJE to “host” is operating normally. 



REMOTE JOB ENTRY 


Rjestat is supplied as the user’s vehicle for checking the status of 
RJE. It is not meant to be an administrative tool; however, the 
reason for failure can be used to track the problem. 


Status Console 

To use rjestat as a status console, the — s host argument is used. 
Rjestat prints the status of the subsystem, then prompts with host: 
if the subsystem is up. Each console request is submitted to the RJE 
processes for transmission, and output is handled as specified. 
Rjestat checks the status prior to submitting each request and will 
tell the user to try later if the subsystem goes down. Rjestat allows 
the RJE or superuser logins to submit other than display requests. 
For a complete description of how to use the status console features, 
see rjestat(lC). 


Cvt 

This program converts any subsystem’s joblog file to readable form. 
The first line printed is the process group number of the subsystem 
processes. The remaining output consists of entries in the following 
form: 


file user-id records level 

where “file” is the name of the submitted file, “user-id” is the 
submitters user number, “records” is the number of card images, and 
“level” is the message level. The “records” and “level” fields are not 
used if the file name is co* (console request submitted by rjestat). 


RJE ACCOUNTING 

Each RJE subsystem will store accounting information in the acctlog 
file if it exists. It is the responsibility of the RJE administrator to 
create and maintain this file in the subsystem’s directory. Entries in 
this file describe RJE line use and are of the following form: 


day time file user records 


8-22 



REMOTE JOB ENTRY 


Each field is delimited by a tab character. The meanings of each 
field is as follows: 

1. Day— The day of occurrence in the form mm/dd. 

2. Time— The time of occurrence in the form hh:mm:ss. 

3 . File— The name of the UNIX system file. The first two 
characters identify its type as follows: 


• rd/sq — The file was transmitted to the remote system. 

• pr— The print output file was received from the remote 
system. 

• pu_The punch output file was received from the remote 
system. 

4. User— The user ID of the user responsible for the transfer. 

5. Records— The number of records (card images) transferred for 
this file. 

Since acctlog data is not used by RJE, it should not be allowed to 
grow too large. This can be accomplished by moving or processing 
the file during a system reboot (i.e., in /etc/rc before the RJE 
subsystems are started). 

The following list describes some of the reports that could be 
generated from the acctlog data. Implementation of a program to 
produce accounting reports is the responsibility of the administrator. 


• Periodic Reports — By using the “day” and “time” fields in the 
data, periodic usage reports can be produced. 

• By User Reports— By using the “user” field in the data, usage- 
by-user reports can be produced. 

• By Subsystem Reports— By using the /usr/rj e/lines file 
information and each acctlog file, a usage-by-subsystem (or 
remote system) report can be produced. 

Other reports can be produced using the type of file, size of jobs, etc. 

8-23 



REMOTE JOB ENTRY 


TROUBLESHOOTING 

This part deals with RJE problems and some methods for resolving 
them. The topics discussed in this part are as follows: 


• Automatic Error Recovery 
a Manual Error Recovery 
® RJE Problems 
e VPM Problems 
e Trace Interpretation. 


Automatic Error Recovery 

RJE attempts to be self-sustaining with respect to its availability. In 
general, if problems occur on the communications line or the remote 
machine (e.g., a crash), RJE will continually try to restart itself (this 
action will be referred to as “reboot”). For example, if an RJE 
subsystem is started using rjeload but the IBM system is not 
available, a fatal error will occur. The process that detects this error 
(usually rjexmit or rjerecv) will reboot the subsystem by executing 
rjeinit with a + as its argument. When rjeinit detects a + 
argument, it waits 1 minute before attempting to bring up the 
subsystem. 


The rjehalt program can be used to prevent an RJE subsystem from 
rebooting itself when the remote system is not available for a known 
period of time. When the remote system is made available, the 
subsystem may be started in the normal way. 


8-24 



REMOTE JOB ENTRY 


Manual Error Recovery 

In order to manually recover from errors, one must know how to 
start and stop an RJE subsystem. There are two ways to start an 
RJE subsystem: 

o rje?load— This program loads and starts the VPM script and 
executes rje?init. 

. r je?init— This program starts the rje? subsystem. In order to 
use this program, the VPM script must have been previously 
loaded and started. 

To stop the rje ? subsystem, the rje?halt program should be 
executed. This stops the subsystem gracefully and will prevent a 
reboot. 


The rjeload program must be used to start RJE for the first time 
(after a UNIX system reboot). Subsequently, as long as the script is 
running, execution sequences of rjehalt and rjeinit will stop and 
start RJE. 

Manually starting and stopping RJE can be useful in tracking down 
problems. For example, if user jobs are not being submitted to the 
host machine, the following sequence can ease identification of the 
problem: 

1. Halt ailing subsystem. 

2. Start a snoop process in the background with its output 
redirected to a file. 

3. Restart subsystem. 

4. Scan snoop output to determine location of problem. 

The snoop program is the most useful software tool for identifying 
RJE problems. Its uses are described in the subpart “Trace 
Interpretation”. 



REMOTE JOB ENTRY 


RJE Problems 

This part describes problems that can occur in an RJE subsystem. 

These problems generally occur when the subsystem has not been set 

up properly. The following is a list of things to check to ensure that 

an RJE subsystem has been set up properly. 

1. IBM description— The description of the remote UNIX operating 
system machine must be consistent with the description in the 
subpart “IBM Generation”. 

2. UNIX system description— The file /usr/rj e/lines must be set up 
properly. The subpart “UNIX System Generation” describes this 
file in detail. 

3. VPM setup— The VPM software must be installed and the proper 
VPM and physical devices made. The permission modes of the 
VPM and physical devices must be set by the system 
administrator to allow read and write access by the RJE 
programs. Each VPM device must correspond to the proper 
physical device; see vpm(7). 

4. Free space— As a general rule, all file systems must have a 
reasonable amount of free space. File systems containing RJE 
subsystems must have sufficient free space to ensure proper RJE 
operation. 

5. Directories— Each subsystem’s directory and the controlling 
directory should be checked for the following: 

• All needed files exist. 

• The proper prefix is on each applicable RJE program. 

• The link count is correct for files that are linked. 

• All file and directory modes are correct. 

6. Initialization— Peripherals information must be consistent on 
both systems. The line must be started on the IBM system, 
proper hardware connections made, etc. 


8-26 



REMOTE JOB ENTRY 


Problems with a subsystem are indicated by error messages. The 
rjeinit checks for obstacles in bringing up RJE. If an obstacle is 
found, an error message indicating the obstacle is printed on the 
error output. If a problem is encountered during normal operation, 
the message is logged in the errlog file. This file, error messages, 
output from snoop, and the checklist above should be used to 
determine and fix any subsystem problems. Generally, if a 
subsystem is set up properly but will not operate, the problem is the 
way the VPM or KMC has been set up, the remote system, or the 
hardware. 


VPM Problems 

After installing the hardware and making the appropriate devices, all 
VPM software and devices must be made [see vpm(7)]. The program 
rjeload links the devices to be used by the corresponding RJE 
subsystem. 


The following is a list of items to check when problems occur: 

1. Proper hardware— The appropriate hardware must be installed. 
Be sure the device is properly described to the system and passes 
diagnostics. 

2. Proper devices— The major and minor device numbers for the 
physical device and VPM devices must be correct. It should also 
be verified that rjeload program is called with the correct 
physical device names. 

3. Script runs— Verify the VPM script is able to run. This is done 
by tracing the proper device with the proper snoop program. 
Snoop will print “started” entries for both the physical device 
and VPM script. If no output appears from snoop when rjeload 
is executed, either the hardware is not working properly or the 
hardware or VPM has not been set up properly. If trace 
information is output for a period of time by snoop and the 
output abruptly stops, a modem problem should be suspected. 
That is, if the RJE cable is disconnected from its associated 
modem or the modem is not powered up and optioned properly, 
then snoop output will "freeze" when the physical device 
attempts to transmit over the RJE link. Output of any other 
type from snoop should indicate where the problem is occurring. 

8-27 



REMOTE JOB ENTRY 


Trace Interpretation 

This part describes how to interpret trace output from the snoop 
program and gives several examples. 

Lines with type TR are traces from the VPM script. All others are 
driver traces and indicate the following: 


® CL— Activity occurring when the device has been closed. 

• OP— Activity occurring when the device has been opened, 
o RD— Read from device occurred, 
o WR— Write to device occurred, 
o ST— Start or stop activity. 

« SC— Script termination type, termination value is given. 


Figure 8-1 at the end of this chapter enumerates all possible trace 
lines for each type and describes the event. The remainder of this 
part consists of example trace output and its interpretation. 
Comments describing events will appear after the in trace output. 
If more than one VPM were running, sequence numbers might not 
appear in order. For clarity, example sequences will be in order. 


Normal RJE Startup 

The following is an example of trace output when RJE has been 
started up. In this case, the remote machine responds to the enquiry 
byte (ENQ). The RJE subsystem signs on to the machine then follows 
the handshaking protocol [exchanging acknowledges (ACKs)]. 


8-28 



REMOTE JOB ENTRY 


Tracing 

vpmO 



0 

ST 

Startack 

* Physical device started 

1 

TR 

Started 

* Script started 

2 

ST 

Start 

* VPM Driver start 

3 

OP 

Opened 

* VPM Device open 

4 

WR 

84 bytes 

* Signon record written 

5 

TR 

S-ENQ 

* Enquiry byte sent 

6 

TR 

R-ACK 

* Received acknowledgment 

7 

TR 

S-BLK 

* Sent signon block 

8 

TR 

R-ACK 

* Block acknowledged 

9 

TR 

S-ACK 

* Handshaking 

1 0 

TR 

R-ACK 

* 

1 1 

TR 

S-ACK 

* 

1 2 

TR 

R-ACK 

* 

1 3 

TR 

S-ACK 

* 

1 4 

TR 

R-ACK 

* 

1 5 

TR 

S-ACK 

* 

16 

TR 

R-ACK 

* 

17 

TR 

S-ACK 

* Handshaking 

If any 

jobs had been submitted 

via the send command or jobs were 


waiting to be returned, the traces would reflect the transfers rather 
than handshaking. 


RJE Startup— IBM not responding 

This example shows trace output when RJE has been started but does 
not receive a response from the remote machine. In general, the RJE 
script will time-out if a response is not received from the remote 
machine within 3 seconds of the last transmission. When a time-out 
is detected while starting up, the ENQ is retransmitted. This is 
repeated six times before the script gives up. Other time-out 
responses will be discussed later. 



REMOTE JOB ENTRY 


Tracing 

vpmO 


86 

ST 

Startack 

87 

TR 

Started 

88 

ST 

Start 

89 

OP 

Opened 

90 

WR 

84 bytes 

9 1 

TR 

S-ENQ 

92 

TR 

TIMEOUT 

93 

TR 

S-ENQ 

94 

TR 

TIMEOUT 

95 

TR 

S-ENQ 

96 

TR 

TIMEOUT 

97 

TR 

S-ENQ 

98 

TR 

TIMEOUT 

99 

TR 

S-ENQ 

0 

TR 

TIMEOUT 

1 

TR 

S-ENQ 

2 

TR 

TIMEOUT 

3 

RD 

1 bytes 

4 

ST 

Stopchk 

5 

ST 

Stopack ( 0 

6 

CL 

Clean 

7 

ST 

Stopped 

8 

CL 

Closed 


* Physical device started 

* Script started 

* VPM Driver start 

* VPM device open 

* Signon record written 

* Enquiry byte sent 

* No response to enquiry 

* Enquiry byte sent 

* No response 

* Enquiry byte sent 

* No response 

* Enquiry byte sent 

* No response 

* Enquiry byte sent 

* No response 

* Enquiry byte sent 

* No response 

* 1-byte read (error) 

* Safety check 

* Script termination normal 

* Cleanup done 

* VPM script stopped 

* VPM device closed 


The above sequence will be repeated approximately every minute 
until a positive response is received from the host. During that 
minute, the RJE subsystem is dormant; and the rjestat command 
will report that IBM is not responding. When this occurs, either the 
IBM machine is not available, down, line not started, etc., or there is 
a communications problem somewhere from where the physical 
device transmits data to where it receives data. The RJE 
administrator should first verify that the IBM machine is up, and the 
communications line has been started. If so, a hardware trace of the 
communications line should be done to aid in detecting the problem. 


Transmitting and Receiving 

This example shows trace output from the start of job transmission 
through its return. For simplicity, only one job is being transmitted 
and returned. 


8-30 



REMOTE JOB ENTRY 


Tracing 

vpmO 



94 

TR 

R-ACK 

* 

95 

TR 

S-ACK 

* 

96 

TR 

R-ACK 

* 

97 

TR 

S-ACK 

* 

98 

WR 

4 bytes 

* 

99 

TR 

R-ACK 

* 

0 

TR 

S-BLK 

* 

1 

TR 

R-OKBLK 

* 

2 

TR 

S-ACK 

* 

3 

RD 

7 bytes 

* 

4 

TR 

R-ACK 

* 

5 

TR 

S-ACK 

* 

6 

WR 

481 bytes 

* 

7 

WR 

470 bytes 

* 

8 

TR 

R-ACK 

* 

9 

TR 

S-BLK 

* 

10 

TR 

R-ACK 

* 

1 1 

WR 

470 bytes 

* 

12 

TR 

S-BLK 

* 

13 

TR 

R-OKBLK 

* 

14 

WR 

470 bytes 

* 

15 

RD 

66 bytes 

* 

16 

TR 

S-BLK 

* 

1 7 

TR 

R-ACK 

* 

18 

WR 

147 bytes 

* 

19 

TR 

S-BLK 

* 

20 

TR 

R-ACK 

* 




* 




* 




* 

93 

TR 

R-ACK 

* 

94 

TR 

S-ACK 

* 

95 

TR 

R-OKBLK 

* 

96 

TR 

S-ACK 

* 

97 

RD 

7 bytes 

* 

98 

TR 

R-ACK 

* 

99 

TR 

S-ACK 

* 

0 

TR 

R-ACK 

* 

1 

TR 

S-ACK 

* 

2 

TR 

R-ACK 

* 

3 

TR 

S-ACK 

* 

4 

WR 

4 bytes 

* 

5 

TR 

R-ACK 

* 

6 

TR 

S-BLK 

* 

7 

TR 

R-OKBLK 

* 

8 

TR 

S-ACK 

* 

9 

RD 

64 bytes 

* 

10 

TR 

R-OKBLK 

* 

1 1 

TR 

S-ACK 

* 

1 2 

RD 

505 bytes 

* 

1 3 

TR 

R-OKBLK 

* 


Handshaking 

Handshaking 

Open reader request written 
Handshaking 

Sent open request block 
Received block (grant) 

Block acknowledged 

Read seven bytes (grant) 

Handshaking 

Handshaking 

First block written 

Second block written 

Handshaking 

First block sent 

Block acknowledged 

Third block written 

Second block sent 

Received block (on reader msg) 

Fourth block written 

Read 66 bytes (on reader msg) 

Third block sent 

Block acknowledged 

Fifth block written 

Fourth block sent 

Block acknowledged 

More of the same 

Handshaking 

Handshaking 

Received block (request) 

Block acknowledged 

Read open printer request 

Handshaking 


Handshaking 
Printer grant written 
Handshaking 
Block sent (grant) 
First block received 
Block acknowledged 
Read first block 
Second block received 
Block acknowledged 
Read second block 
Third block received 


8-31 



REMOTE JOB ENTRY 


1 4 

TR 

S-ACK 

* 

Block acknowledged 

15 

TR 

R-OKBLK 

* 

Fourth block received 

1 6 

TR 

S-ACK 

* 

Block acknowledged 

1 7 

TR 

R-ACK 

* 

Handshaking 

1 8 

TR 

S-ACK 

* 


19 

TR 

R-ACK 

* 


20 

TR 

S-ACK 

* 

Handshaking 

2 1 

RD 

470 bytes 

* 

Read third block 

22 

RD 

494 bytes 

* 

Read fourth block 

23 

TR 

R-ACK 

* 

Handshaking 

24 

TR 

S-ACK 

* 

Handshaking 


* 


* etc . 


Requests and grants are part of the multileaving protocol. When 
jobs are being transmitted and received simultaneously, as in a 
busier RJE subsystem, much less handshaking is involved. Rather 
than acknowledging blocks with ACKs, the protocol allows a block to 
be returned (this implies acknowledgment of the received block). The 
following example shows trace output at a busy time: 


tracing 

vpmO 



45 

TR 

R-OKBLK 

* Received block 

46 

TR 

S-BLK 

* Sent block 

47 

WR 

493 bytes 

* 

48 

RD 

496 bytes 

* 

49 

TR 

R-OKBLK 

* Received block 

50 

RD 

65 bytes 

* 

5 1 

WR 

4 bytes 

* 

52 

TR 

S-BLK 

* Sent block 

53 

TR 

R-OKBLK 

* Received block 

54 

TR 

S-BLK 

* Sent block 

55 

WR 

493 bytes 

* 

56 

RD 

7 bytes 

* 

57 

TR 

R-OKBLK 

* Received block 

58 

WR 

493 bytes 

* 

59 

RD 

496 bytes 

* 

60 

TR 

S-BLK 

* Sent block 

61 

TR 

R-OKBLK 

* Received block 


Notice that since there is work to be done on both sides 
acknowledgments are implied. 


8-32 



REMOTE JOB ENTRY 


Trace Output Indicating Performance Problems 

Trace output is useful in detecting performance problems on the 
remote IBM system or on the local UNIX system. 


The first example shows activity resulting from time-outs occurring 
during normal operation. These time-outs were caused because the 
remote JES3 system has performance problems, and occasionally, 
does not respond in the required 3 seconds. 


Tracing 

27 

vpmO 

TR 

S-ACK 

* 

Handshaking 

28 

TR 

R-ACK 

* 


29 

TR 

S-ACK 

* 


30 

TR 

TIMEOUT 

* 

No response 

3 1 

TR 

S-NAK 

* 

Not acknowledged 

32 

TR 

TIMEOUT 

* 

No response 

33 

TR 

S-NAK 

* 

Not acknowledged 

34 

TR 

R-ACK 

* 

Response 

35 

TR 

S-ACK 

* 

Handshaking 

36 

TR 

R-ACK 

* 

* 


54 

TR 

R-ACK 

* 

* 


55 

TR 

S-ACK 

* 

Handshaking 

56 

TR 

TIMEOUT 

* 

No response 

57 

TR 

S-NAK 

* 

Not acknowledged 

58 

TR 

R-ACK 

* 

Response 

59 

TR 

S-ACK 

* 

Handshaking 


The response to these time-outs are NAKs (not acknowledged). RJE 
will respond this way up to six times before giving up and attempting 
a reboot. At this time, rjestat would report that there are “Line 
Errors”. NAK is a request to retransmit the previous response. 

In the second example, time-outs occur because the local UNIX 
system has performance problems. When RJE is run on a heavily- 
loaded UNIX system, the RJE script occasionally pauses for a short 
period before sending acknowledgment messages to the remote host. 
Each pause serves to throttle the rate at which data may pass from 
the remote system to the UNIX system. Unfortunately, on a severely 
overloaded UNIX system this mechanism for controlling data flow 



REMOTE JOB ENTRY 


cannot guarantee proper RJE operation. Time-outs result from the 
UNIX system’s inability to respond to the remote system in the 
required 3 seconds. 


* 

* 

* UNIX system heavily loaded - 

* Time-outs successfully avoided 

* by pausing before sending 

* acknowledgments to remote system 

* 

* 


33 

TR 

S-ACK 

* Previous block acknowledged 

34 

TR 

R-OKBLK 

* Received block 

35 

TR 

S-ACK 

* Block acknowledged 

36 

TR 

R-OKBLK 

* Received block 

37 

RD 

505 bytes 

* 

38 

TR 

S-ACK 

* Block acknowledged 

39 

TR 

R-OKBLK 

* Received block 

40 

TR 

S-ACK 

* Block acknowledged 

4 1 

TR 

R-OKBLK 

* Received block 

42 

TR 

PAUSE-ACK 

* Script pauses before acknowledging 

43 

TR 

S-ACK 

* Block acknowledged 

44 

RD 

475 bytes 

* 

45 

RD 

466 bytes 

* 

46 

TR 

R-OKBLK 

* Received block 

* 


* 

* 

* 





* 

* 

* 

* 

* 

* UNIX system severely overloaded - 




* Time-outs cannot be totally 




* prevented by pausing before 




* sending acknowledgments 

* 

1 7 

TR 

S-ACK 

* 

* Previous block acknowledged 

18 

TR 

R-OKBLK 

* Block received 

19 

TR 

S-ACK 

* Block acknowledged 

20 

TR 

R-OKBLK 

* Block received 

2 1 

TR 

S-ACK 

* Block acknowledged 

22 

TR 

R-OKBLK 

* Block received 

23 

TR 

PAUSE-ACK 

* Script pauses before acknowledging 

24 

RD 

405 bytes 

* 

25 

RD 

503 bytes 

* 


8-34 



REMOTE JOB ENTRY 


26 

RD 

473 bytes 

* 

27 

TR 

S-ACK 

* Block acknowledged (but not in 3 sec 

28 

TR 

TIMEOUT 

* No response (ACK was sent too late) 

29 

TR 

S-NAK 

* Attempt to recover 

30 

TR 

R-OKBLK 

* Block received 

3 1 

TR 

S-ACK 

* Block acknowledged 

32 

RD 

417 bytes 

* 

33 

TR 

R-OKBLK 

* Block received 

34 

TR 

S-ACK 

* Block acknowledged 


* 

* 

* 


In such instances, the reasons for the overloading of the UNIX 
system should be investigated and remedied. If system overloading 
cannot be prevented, then RJE operation should be limited to those 
time periods when there is less contention for system resources. 


Communication Line Errors 

This example shows trace output from an RJE subsystem that uses a 
dial-up connection. The phone line is noisy and is prone to dropping. 



REMOTE JOB ENTRY 


Tracing 

vpmO 


63 

TR 

S-ACK 

64 

TR 

R-ACK 

65 

TR 

S-ACK 

66 

TR 

R-JUNK 

67 

TR 

S-NAK 

68 

TR 

R-ACK 

69 

TR 

S-ACK 

70 

TR 

R-ACK 

7 1 

TR 

S-ACK 

72 

TR 

R-JUNK 

73 

TR 

S-NAK 

74 

TR 

R-JUNK 

75 

TR 

S-NAK 

76 

TR 

R-JUNK 

77 

TR 

S-NAK 

78 

TR 

R-JUNK 

79 

TR 

S-NAK 

80 

TR 

R-JUNK 

8 1 

TR 

S-NAK 

82 

TR 

R-JUNK 

83 

RD 

1 bytes 

84 

ST 

Stopack ( 0 ) 

85 

CL 

Clean 

86 

ST 

Stopped 

87 

CL 

Closed 


* Handshaking 

* 

* 

* Noise on the line 

* Not acknowledged 

* Recovery 

* 

* 

* 

* Noise on the line 

* Attempting to recover 

* 

* 

* 

* 

* 

* 

* 

* 

* . 

* 1-byte read (error) 

* Script termination normal 

* Cleanup 

* VPM script stopped 

* VPM device closed 


The error read in the above sequence causes RJE to reboot and 
rjestat to report line errors. If this type of problem were to occur 
frequently, the RJE link should be tested and the hardware 
connections to the link examined and replaced if necessary. 

Error Responses 

As seen in the parts above, the response to most errors is to send a 
NAK. The only exception is when starting up. Whenever a NAK is 
received on either side, it indicates that the previous transmission 
was not properly received. This should be followed by retransmission 
of the previous data. Generally, NAKs should not occur frequently 
and should be followed by recovery. If errors occur frequently or 
NAKs do not cause recovery, the line should be checked for problems. 

On some IBM systems (e.g., JES2), an I/O error is printed at the 
system console whenever a NAK is received. These I/O errors can 
also be helpful in detecting the problem; however, they will not be 
discussed here as they vary with the system. It is assumed that 
someone in IBM support can assist if needed. 


8-36 



REMOTE JOB ENTRY 


TYPE 

INFORMATION 

MEANING 

CL 

Closed 

The virtual protocol machine (VPM) 
device has been closed. 

CL 

Clean 

The VPM driver is cleaning up for 
this device. 

OP 

Opened 

The VPM has been successfully 
opened. 

OP 

Failed(open) 

The open failed because the device 
was already open. 

OP 

Failed(dev) 

The open failed because the device 
number was out of range. 

OP 

Failed(set) 

The open failed because the physical 
device could not be reset. 

RR 

Buf 

The VPM script has returned a 
receive buffer to the VPM driver. 

RX 

Buf 

The VPM script has returned a 
transmit buffer to the VPM driver. 

RD 

num bytes 

Num bytes were read from the VPM 
device by rjerecv. 

SC 

Exit(num) 

The VPM script has terminated. The 
VPM exit code is num. Exit codes 
are defined in vpm( 7). 

ST 

Startup 

The physical device has been started. 

ST 

Stopped 

The VPM script has been stopped. 

TR 

Started 

The script has started tracing. 

TR 

R-ACK 

A 2-byte acknowledgment (ACK) 
string has been received from the 
remote system. This indicates that 
the previous transmission was 
properly received. 

TR 

S-ACK 

A 2-byte ACK string has been 
transmitted to the remote system. 

TR 

R-NAK 

A “not-acknowledged” (NAK) 

character has been received from the 
remote system. This indicates that 
the previous transmission was not 
properly received. 


Figure 8-1. SNOOP Trace Entries (Sheet 1 of 2) 



REMOTE JOB ENTRY 


TR 

S-NAK 

A NAK character has been 
transmitted to the remote system. 

TR 

R-ENQ 

An enquiry (ENQ) character has 
been received from the remote 
system. 

TR 

S-ENQ 

An ENQ character has been 
transmitted to the remote system. 

TR 

R-WAIT 

The remote machine has requested 
that no data be transmitted to it. 

TR 

R-OKBLK 

A valid data block was received 
from the remote machine. 

TR 

R-ERRBLK 

An invalid cyclic redundancy check 
(CRC) was received with a data 
block. 

TR 

R-SEQERR 

The block sequence count on a 
received data block was invalid. 

TR 

R-JUNK 

An invalid data block was received 
from the remote system. 

TR 

TIMEOUT 

The remote machine did not respond 
within 3 seconds. 

TR 

S-BLK 

A data block has been transmitted 
to the remote system. 

TR 

PAUSE-ACK 

The script has paused prior to 
sending an acknowledgment string 
to the remote system. 

WR 

num bytes 

Num bytes were written to the VPM 


device by rjexmit. 

Figure 8-1. SNOOP Trace Entries (Sheet 2 of 2) 


8-38 



Chapter 9 

UNIX SYSTEM ACTIVITY PACKAGE 

GENERAL 

This chapter describes the design and implementation of the UNIX 
System Activity Package. The UNIX operating system contains a 
number of counters that are incremented as various system actions 
occur. The system activity package reports UNIX system-wide 
measurements including central processing unit (CPU) utilization, 
disk and tape input/output (I/O) activities, terminal device activity, 
buffer usage, system calls, system switching and swapping, file-access 
activity, queue activity, and message and semaphore activities. 


Throughout this chapter, each reference of the form name(lM), 
name(7), or name(8) refers to entries in the UNIX System V 
Administrator Reference Manual. References to entries of the form 
name(N), where " N" is the number 1 or 6 possibly followed by a 
letter, refer to entry name in section N of the UNIX System V User 
Reference Manual. If " N" is a number 2 through 5 possibly followed 
by a letter, refer to entry name in section N of the UNIX System V 
Programmer Reference Manual. 

The package provides four commands that generate various types of 
reports. Procedures that automatically generate daily reports are 
also included. The five functions of the activity package are: 


® sar(l) command— allows a user to generate system activity 
reports in real-time and to save system activities in a file for 
later usage. 

0 sag(lG) command— displays system activity in a graphical 
form. 

• sadp(lM) command— samples disk activity once every second 
during a specified time interval and reports disk usage and 
seek distance in either tabular or histogram form. 


9-1 



SYSTEM ACTIVITY PACKAGE 


• timex(l)— a modified time(l) command that times a 
command and also optionally reports concurrent system 
activity and process accounting activity. 

• system activity daily reports— procedures are provided for 
sampling and saving system activities in a data file 
periodically and for generating the daily report from the data 
file. 


The system activity information reported by this package is derived 
from a set of system counters located in the operation system kernel. 
These system counters are described in the part “System Activity 
Counters”. The part “System Activity Commands” describes the 
commands provided by this package. The procedure for generating 
daily reports is given in “Daily Report Generation”. For a 
description of the files used by the system activity package, see 
Attachment 9-1 at the end of this chapter. 


SYSTEM ACTIVITY COUNTERS 

The UNIX operating system manages a number of counters that 
record various activities and provide the basis for the system activity 
reporting system. The data structure for most of these counters is 
defined in the sysinfo structure in /usr/include/sys/sysinfo.h (see 
Attachment 9-2 at the end of this chapter). The system table 
overflow counters are kept in the _syserr structure. The device 
activity counters are extracted from the device status tables. In this 
version, the I/O activity of the following devices is recorded: RP06, 
RM05, RS04, RF11, RK05, RP03, RL02, TM03, and TM11. 


The following paragraphs describe the system activity counters 
sampled by the system activity package. 


Cpu time counters— There are four time counters that may be 
incremented at each clock interrupt 60 times per second. According 
to the mode the CPU is in at the interrupt (idle, user, kernel, and 
wait for I/O completion), exactly one of the cpu[] counters is 
incremented. 


9-2 



SYSTEM ACTIVITY PACKAGE 


Lread and lwrite— The lread and lwrite counters are used to count 
logical read and write requests issued by the system to block devices. 


Bread and bwrite— The bread and bwrite counters are used to 
count the number of times data is transferred between the system 
buffers and the block devices. These actual I/Os are triggered by 
logical I/Os that cannot be satisfied by the current contents of the 
buffers. The ratio of block I/O to logical I/O is a common measure of 
the effectiveness of the system buffering. 


Phread and phwrite— The phread and phwrite counters count 
read and write requests issued by the system to raw devices. 


Swapin and swapout— The swapin and swapout counters are 
incremented for each system request initiating a transfer from or to 
the swap device. More than one request is usually involved in 
bringing a process in to or out of memory because text and data are 
handled separately. Frequently used programs are kept on the swap 
device and are swapped in rather than loaded from the file system. 
The swapin counter reflects these initial loading operations as well as 
resumptions of activity, while the swapout counter reveals the level 
of actual “swapping.” The amount of data transferred between the 
swap device and memory are measured in blocks and counted by 
bswapin and bswapout. 


Pswitch and syscall— These counters are related to the 
management of multiprogramming. Syscall is incremented every 
time a system call is invoked. The numbers of invocations of 
read(2), write(2), fork (2), and exec (2) system calls are kept in 
counters sysread, syswrite, sysfork, and sysexec, respectively. 
Pswitch counts the times the switcher was invoked, which occurs 
when: 


1. A system call resulted in a road block 

2. An interrupt occurred resulting in awakening a higher priority 
process 

3. A 1 second clock interrupt occurs. 



SYSTEM ACTIVITY PACKAGE 


Iget, namei, and dirblk— These counters apply to file-access 
operations. Iget and namei, in particular, are the names of UNIX 
operating system routines. The counters record the number of times 
the respective routines are called. Namei is the routine that 
performs file system path searches. It searches the various directory 
files to get the associated i-number of a file corresponding to a 
special path. Iget is a routine called to locate the inode entry of a file 
(i-number). It first searches the in-core inode table. If the inode 
entry is not in the table, routine iget will get the inode from the file 
system where the file resides and make an entry in the in-core inode 
table for the file. Iget returns a pointer to this entry. Namei calls 
iget, but other file access routines also call iget. Therefore, counter 
iget is always greater than counter namei. 

Counter dirblk records the number of directory block reads issued by 
the system. It is noted that the directory blocks read divided by the 
number of namei calls estimates the average path length of files. 

Runque, runocc, swpque, and swpocc— These counters are used 
to record queue activities. They are implemented in the clock. c 
routine. At every 1 second interval, the clock routine examines the 
process table to see whether any processes are in core and in ready 
state. If so, the counter runocc is incremented and the number of 
such processes are added to counter runque. While examining the 
process table, the clock routine also checks whether any processes in 
the swap device are in ready state. The counter swpocc is 
incremented if the swap queue is occupied, and the number of 
processes in swap queue is added to counter swpque. 


Readch and writech— The readch and writech counters record the 
total number of bytes (characters) transferred by the read and 
write system calls, respectively. 


Monitoring terminal device activities— There are six counters 
monitoring terminal device activities. Rcvint, xmtint, and mdmint 
are counters measuring hardware interrupt occurrences for receiver, 
transmitter, and modem individually. Rawch, canch, and outch count 
number of characters in the raw queue, canonical queue, and output 
queue. Characters generated by devices operating in the cooked 
mode, such as terminals, are counted in both rawch and (as edited) in 
canch; but characters from raw devices, such as communication 
processors, are counted only in rawch. 


9-4 



SYSTEM ACTIVITY PACKAGE 


Msg and sema counters— These counters record message sending 
and receiving activities and semaphore operations, respectively. 


Monitoring I/O activities— As to the I/O activity for a disk or 
tape device, four counters are kept for each disk or tape drive in the 
device status table. Counter io_ops is incremented when an I/O 
operation has occurred on the device. It includes block I/O, swap 
I/O, and physical I/O. Io_bcnt counts the amount of data 
transferred between the device and memory in 512-byte units. Io_act 
and io_resp measure the active time and response time of a device in 
time ticks summed over all I/O requests that have completed for each 
device. The device active time includes the device seeking, rotating, 
and data transferring times, while the response time of an I/O 
operation is from the time the I/O request is queued to the device to 
the time when the I/O completes. 


Inodeovf, fileovf, textovf, and procovf— These counters are 
extracted from _syserr structure. When an overflow occurs in any of 
the inode, file, text, and process tables, the corresponding overflow 
counter is incremented. 


SYSTEM ACTIVITY COMMANDS 

The system activity package provides three commands for generating 
various system activity reports and one command for profiling disk 
activities. These tools facilitate observation of system activity during 


• A controlled stand-alone test of a large system 

® An uncontrolled run of a program to observe the operating 
environment 

» Normal production operation. 


Commands sar and sag permit the user to specify a sampling 
interval and number of intervals for examining system activity and 
then to display the observed level of activity in tabular or graphical 



SYSTEM ACTIVITY PACKAGE 


form. The timex command reports the amount of system activity 
that occurred during the precise period of execution of a timed 
command. The sadp command allows the user to establish a 
sampling period during which access location and seek distance on 
specified disks are recorded and later displayed as a tabular 
summary or as a histogram. 


The “sar” Command 

The sar command can be used in the following two ways: 


o When the frequency arguments t and n are specified, it 
invokes the data collection program sadc to sample the system 
activity counters in the operating system every t seconds for n 
intervals and generates system activity reports in real-time. 
Generally, it is desirable to include the option to save the 
sampled data in a file for later examination. The format of 
the data file is shown in sar(lM). In addition to the system 
counters, a time stamp is also included. It gives the time at 
which the sample was taken. 

• If no frequency arguments are supplied, it generates system 
activity reports for a specified time interval from an existing 
data file that was created by sar at an earlier time. 


A convenient usage is to run sar as a background process saving its 
samples in a temporary file but sending its standard output to 
/dev/null. Then an experiment is conducted after which the system 
activity is extracted from the temporary file. The sar(l) manual 
entry describes the usage and lists various types of reports. 
Attachment 9-3 (at the end of this chapter) gives the formula for 
deriving each reported item. 


The “sag” Command 

Sag displays system activity data graphically. It relies on the data 
file produced by a prior run of sar after which any column of data or 
the combination of columns of data of the sar report can be plotted. 
A fairly simple but powerful command syntax allows the 
specification of cross plots or time plots. Data items are selected 
using the sar column header names. The sar(lG) manual entry 


9-6 



SYSTEM ACTIVITY PACKAGE 


describes its options and usage. The system activity graphical 
program invokes graphics(lG) and tplot(lG) commands to have the 
graphical output displayed on any of the terminal types supported by 

tplot. 


The “timex” Command 

The timex command is an extension of the time(l) command. 
Without options, timex behaves like time. In addition to giving the 
time information, it can also print a system activity report and a 
process accounting report. For all the options available, refer to the 
manual entry timex(l). It should be emphasized that the user and 
sys times reported in the second and third lines are for the measured 
process itself including all its children while the remaining data 
(including the cpu user % and cpu sys % ) are for the entire system. 


While the normal use of timex will probably be to measure a single 
command, multiple commands can also be timed either by combining 
them in an executable file and timing it or by typing: 


timex sh -c " cmdl; cmd2; ... 

This establishes the necessary parent-child relationships to correctly 
extract the user and system times consumed by cmdl, cmd2, ... (and 
the shell). 


The “sadp” Command 

Sadp is a user level program that can be invoked independently by 
any user. It requires no storage or extra code in the operating 
system and allows the user to specify the disks to be monitored. The 
program is reawakened every second, reads system tables from 
/dev/kmem, and extracts the required information. Because of the 1 
second sampling, only a small fraction of disk requests are observed; 
however, comparative studies have shown that the statistical 
determination of disk locality is adequate when sufficient samples 
are collected. 


In the operating system, there is an iobuf for each disk drive. It 
contains two pointers which are head and tail of the I/O active queue 



SYSTEM ACTIVITY PACKAGE 


for the device. The actual requests in the queue may be found in 
three buffer header pools— system buffer headers for block I/O 
requests, physical buffer headers for physical I/O requests, and swap 
buffer headers for swap I/O. Each buffer header has a forward 
pointer that points to the next request in the I/O active queue and a 
backward pointer that points to the previous request. 


Sadp snapshots the iobuf of the monitored device and the three 
buffer header pools once every second during the monitoring period. 
It then traces the requests in the I/O queue, records the disk access 
location, and seeks distance in buckets of 8-cylinder increments. At 
the end of monitoring period, it prints out the sampled data. The 
output of sadp can be used to balance load among disk drives and to 
rearrange the layout of a particular disk pack. The usage of this 
command is described in manual entry sadp(lM). 


DAILY REPORT GENERATION 

The previous part described the commands available to users to 
initiate activity observations. It is probably desirable for each 
installation to routinely monitor and record system activity in a 
standard way for historical analysis. This part describes the steps 
that a system administrator may follow to automatically produce a 
standard daily report of system activity. 


Facilities 


e sadc— The executable module of sadc.c (see Attachment 9-1 
at the end of this chapter) which reads system counters from 
/dev/kmem and records them to a file. In addition to the file 
argument, two frequency arguments are usually specified to 
indicate the sampling interval and number of samples to be 
taken. In case no frequency arguments are given, it writes a 
dummy record in the file to indicate a system restart. 

s sal— The shell procedure that invokes sadc to write system 
counters in the daily data file /usr/adm/sadd where dd 
represents the day of the month. It may be invoked with 
sampling interval and iterations as arguments. 


9-8 



SYSTEM ACTIVITY PACKAGE 


o sa2— The shell procedure that invokes the sar command to 
generate daily report /usr/adm/sa/sardd from the daily data 
file /usr/adm/sa/sad d . It also removes daily data files and 
report files after 7 days. The starting and ending times and all 
report options of sar are applicable to sa2. 


Suggested Operational Setup 

It is suggested that the cron(lM) control the normal data collection 
and report generation operations. For example, the sample entries in 
/ usr/spool/cron/crontab/sys : 

0 * * * 0,6 /usr/lib/sa/sal 
0 18-7 * * 1-5 /usr/lib/sa/sal 
0 8-17 * * 1-5 /usr/lib/sa/sal 1200 3 

would cause the data collection program sadc to be invoked every 
hour on the hour. Moreover, depending on the arguments presented, 
it writes data to the data file one to three times at every 20 minutes. 
Therefore, under the control of cron(lM), the data file is written 
every 20 minutes between 8:00 and 18:00 on weekdays and hourly at 
other times. 

Note that data samples are taken more frequently during prime time 
on weekdays to make them available for a finer and more detailed 
graphical display. It is suggested that sal be invoked hourly rather 
than invoking it once every day; this ensures that if the system 
crashes data collection will be resumed within an hour after the 
system is restarted. 


Because system activity counters restart from zero when the system 
is restarted, a special record is written on the data file to reflect this 
situation. This process is accomplished by invoking sadc with no 
frequency arguments within /etc/rc when going to multiuser state: 


su adm -c " /usr/lib/sa/sadc /usr/adm/sa/sa‘date +%d‘" 


9-9 


i 



SYSTEM ACTIVITY PACKAGE 


Cron(lM) also controls the invocation of sar to generate the daily 
report via shell procedure sa2. One may choose the time period the 
daily report is to cover and the groups of system activity to be 
reported. For instance, if: 

0 20 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:00 -i 3600 -uybd 


is an entry in /usr/spool/cron/crontab/sys, cron will execute the 
sar command to generate daily reports from the daily data file at 
20:00 on weekdays. The daily report reports the CPU utilization, 
terminal device activity, buffer usage, and device activity every hour 
from 8:00 to 18:00. 


In case of a shortage of the disk space or for any other reason, these 
data files and report files can be removed by the superuser. The 
manual entry sar(lM) describes the daily report generation 
procedure. 


9-10 



SYSTEM ACTIVITY PACKAGE 


ATTACHMENT 9-1 


The source files and shell programs of the system activity package 
are in directory /usr/src/cmd/sa. 


sa.h The system activity header file defines the 

structure of data file and device information 
for measured devices. It is included in sadc.c, 
sar.c, and timex.c. 


sadc.c 


sar.c 


saghdr.h 


The data collection program that accesses 
/dev/kmem to read the system activity 
counters and writes data either on standard 
output or on a binary data file. It is invoked 
by the sar command generating a real-time 
report. It is also invoked indirectly by entries 
in /usr/spool/cron/crontab/sys to collect 
system activity data. 

The report generation program invokes sadc 
to examine system activity data, generates 
reports in real-time, and saves the data to a 
file for later usage. It may also generate 
system activity reports from an existing data 
file. It is invoked indirectly by cron to 
generate daily reports. 

The header file for saga.c and sagb.c. It 
contains data structures and variables used by 

saga.c and sagb.c. 


saga.c & sagb.c The graph generation program that first 
invokes sar to format the data of a data file in 
a tabular form and then displays the sar data 
in graphical form. 

sal.sh The shell procedure that invokes sadc to write 

data file records. It is activated by entries in 
/ usr /spool/ cron/ cron tab/ sys . 


9-11 



SYSTEM ACTIVITY PACKAGE 


sa2.sh 


timex.c 


sadp.c 


The shell procedure that invokes sar to 
generate the report. It also removes the daily 
data files and daily report files after a week. 
It is activated by an entry in 
/usr/spool/cron/crontab/sys on weekdays. 

The program that times a command and 
generates a system activity or process 
accounting report. 

The program that samples and reports disk 
activities. 


9-12 



SYSTEM ACTIVITY PACKAGE 


ATTACHMENT 9-2 


struct sysinfo 

{. 

time_t 

cpu[4]; 

#define 

CPU_IDLE 

0 

#define 

CPU_USER 

1 

#define 

CPU_KERNAL 

2 

#define 

CPUJWAIT 

3 


time_t 

wait[3]; 

#define 

W 10 

0 

#define 

W_SWAP 

1 

#define 

W_PI0 

2 


long 

bread; 


long 

bwrite; 


long 

lread; 


long 

lwrite; 


long 

phread; 


long 

phwrite; 


long 

swapin; 


long 

swapout; 


long 

bswapin; 


long 

bswapout; 


long 

pswitch; 


long 

syscall; 


long 

sysread; 


long 

syswrite; 


long 

sysfork; 


long 

sysexec; 


long 

runque; 


long 

runocc; 


long 

swpque; 


long 

swpocc; 


long 

iget; 


long 

namei; 


long 

dirblk; 


long 

readch; 


long 

writech; 


long 

rcvint; 


long 

xmtint; 


long 

mdmint; 


long 

rawch; 


long 

canch; 


long 

outch; 


9-13 



SYSTEM ACTIVITY PACKAGE 


long 

long 


msg; 

sema; 



SYSTEM ACTIVITY PACKAGE 


ATTACHMENT 9-3 


The derivation of the reported items is given in this attachment. 
Each item discussed below is the data difference sampled at two 
distinct times t2 and tl. 

CPU Utilization 


% -of-cpu-x = cpu-x / (cpu-idle + cpu-user + cpu-kernel + cpu-wait) * 100 
where cpu-x is cpu-idle, cpu-user, cpu-kernel (cpu-sys), or cpu-wait. 

Cache Hit Ratio 


% -of-cache-I/O = (logical-I/O - block-I/O) / logical-I/O * 100 

where cache I/O is cache read or cache write. 

Disk or Tape I/O Activity 


%-of-busy = I/O-active / (t2 - tl) * 100; 
avg-queue-length = I/O-resp / I/O-active; 
avg-wait = (I/O-resp - I/O-active) / I/O-ops; 
avg-service-time = I/O-active / I/O-ops. 


Queue Activity 

avg-x-queue-length = x-queue / x-queue-occupied-time; 

% -of-x-queue-occupied-time = x-queue-occupied-time / (t2 - tl); 


where x-queue is run queue or swap queue. 

The Rest of System Activity 

avg-rate-of-x = x / (t2 - tl) 

where x is swap in/out, blks swapped in/out, terminal device 
activities, read/write characters, block read/write, logical read/write, 
process switch, system calls, read/write, fork/exec, iget, namei, 
directory blocks read, disk/tape I/O activities, message, or semaphore 
activities. 



SYSTEM ACTIVITY PACKAGE 


NOTES 


9-16 



Chapter 10 

UUCP ADMINISTRATION 

INTRODUCTION 

This chapter describes how a uucp network is set up, the format of 
control files, and administrative procedures. Administrators should 
be familiar with the manual pages for each of the uucp related 
commands. 


PLANNING 

In setting up a network of UNIX systems, there are several 
considerations that should be taken into account before configuring 
each system on the network. The following parts attempt to outline 
the most important considerations. 


Extent of the Network 

Some basic decisions about access to processors in the network must 
be made before attempting to set up the configuration files. If an 
administrator has control over only one processor and an existing 
network is being joined, then the administrator must decide what 
level of access should be granted to other systems. The other 
members of the network must make a similar decision for the new 
system. The UNIX system password mechanism is used to grant 
access to other systems. The file /usr/lib/uucp/USERFILE restricts 
access by other systems to parts of the file system tree, and the file 
/usr/lib/uucp/L.sys on the local processor determines how many 
other systems on the network can be reached. 


When setting up more than one processor, the administrator has 
control of a larger portion of the network and can make more 
decisions about the setup. For example, the network can be set up as 
a private network where only those machines under the direct control 
of the administrator can access each other. Granting no access to 
machines outside the network can be done if security is paramount; 
however, this is usually impractical. Very limited access can be 


10-1 



UUCP 


granted to outside machines by each of the systems on the private 
network. Alternatively, access to/from the outside world can be 
confined to only one processor. This is frequently done to minimize 
the effort in keeping access information (passwords, phone numbers, 
login sequences, etc.) updated and to minimize the number of security 
holes for the private network. 


Hardware and Line Speeds 

There are only two supported means of interconnection by uucp(l), 

1. Direct connection using a null modem. 

2. Connection over the Direct Distance Dialing (DDD) network. 


In choosing hardware, the equipment used by other processors on the 
network must be considered. For example, if some systems on the 
network have only 103-type (300-baud) data sets, then communication 
with them is not possible unless the local system has a 300-baud data 
set connected to a calling unit. (Most data sets available on systems 
are 1200-baud.) If hard-wired connections are to be used between 
systems, then the distance between systems must be considered since 
a null modem cannot be used when the systems are separated by 
more than several hundred feet. The limit for communication at 
9600-baud is about 800 to 1000 feet. However, the RS232 specification 
and Western Electric Support Groups only allow for less than 50 feet. 
Limited distance modems must be used beyond 50 feet as noise on the 
lines becomes a problem. 


Maintenance and Administration 

There is a minimum amount of maintenance that must be provided 
on each system to keep the access files updated, to ensure that the 
network is running properly, and to track down line problems. When 
more than one system is involved, the job becomes more difficult 
because there are more files to update and because users are much 
less patient when failures occur between machines that are under 
local control. 


10-2 



UUCP 


UUCP SOFTWARE 

Figure 10-1 (at the end of this chapter) is an illustration of the 
daemons used by the uucp network to communicate with another 
system. The uucp(l) or uux(l) command queues users requests and 
spawns the uucico daemon to call another system. Figure 10-2 (at 
the end of this chapter) illustrates the structure of uucico and the 
tasks that it performs in communicating with another system. 
Uucico initiates the call to another system and performs the file 
transfer. On the receiving side, uucico is invoked to receive the 
transfer. Remote execution jobs are actually done by transferring a 
command file to the remote system and invoking a daemon (uuxqt) 
to execute that command file and return the results. 


INSTALLATION 

The uucp(l) package is delivered as part of the standard UNIX 
system distribution. It resides in its own subdirectory (called uucp ) 
in the commands area and has its own make file ( uucp.mk ). The 
uucp package is installed as part of the normal distribution; 
however, if it must be reinstalled for any reason, then the sequence 


make -f uucp.mk install 
should be executed. 


Object Modules 

The following object modules are installed as part of the uucp make 
procedure. 

1. Uucp— The file transfer command. 

2. Uux— The remote execution command. 

3. Uucico— The uucp network daemon. 

4. Uustat— Network status command. 



UUCP 


5. Uuclean— Cleanup command. 

6. Uusub— The command for monitoring and creating a 
subnetwork. 

7. Uuxqt— The remote execution daemon. 

8. Uudemon.day — A shell procedure that is invoked each day to 
maintain the network. Shell scripts for execution each week 
(uudemon.wk) and each hour (uudemon.hr) are also 
distributed. 


Password File 

To allow remote systems to call the local system, password entries 
must be made for any uucp logins. For example, 


nuucp:zaaAA:6:l:UUCP.Admin:/usr/spool/uucppublic:/usr/lib/uucp/uucico 


Note that the uucico daemon is used for the shell, and the spool 
directory is used as the working directory. 


There must also be an entry in the passwd file for an uucp 
administrative login. This login is the owner of all the uucp object 
and spooled data files and is usually " uucp" . For example, the 
following is a entry in /etc/passwd for this administrative login: 


uucp:zAvLCKp:5:l:UUCP.Admin:/usr/lib/uucp: 


Note that the standard shell is used instead of uucico. If an owner 
other than " uucp" is chosen, the make file for uucp 
(/ usr/src/cmd/uucp/uucp.mk ) must be edited. The line 
" OWNER=uucp" must be changed to reflect the new owner login. 


10-4 



UUCP 


Lines File 

The file /usr/lib/uucp/L-devices contains the list of all lines that are 
directly connected to other systems or are available for calling other 
systems. The file contains the attributes of the lines and whether the 
line is a permanent connection or can call via a dialer. The format of 
the file is 

type line call-device speed protocol 
where each field is 

type Two keywords are used to describe whether a line 

is directly connected to another system (DIR) or 
uses an automatic calling unit (ACU). An X.25 
permanent virtual circuit would use the DIR 
keyword. 

line This is the device name for the line (e.g., ttyab for 

a direct line, culO for a line connected to an ACU). 

call-device If the ACU keyword is specified, this field 
contains the device name of the ACU. Otherwise, 
the field is ignored; however, a placeholder must 
be used in this field so that the protocol field can 
be interpreted. 

speed The line speed that the connection is to run at. 

(The speed field is currently ignored if an X.25 
link is used.) 

protocol This is an optional field that needs only be filled 

in if the connection is for a protocol other than 
the default terminal protocol. The X.25 protocol 
is the only other protocol supported and the single 
character x is used to select this protocol. 

The following entries illustrate various types of connections: 

DIR ttyab 0 9600 
ACU culO cuaO 1200 
DIR x25.s0 0 300 x 


10-5 



UUCP 


The first entry is for a hard-wired line running at 9600-baud between 
two systems. Note that the acu-device field is zero. The second entry 
is for a line with a 1200-baud ACU. The last entry is for an X.25 
synchronous direct connection between systems. Note that the 
protocol field is filled in and that the acu-device and line speed fields 
are meaningless. 


Naming Conventions 

It is often useful when naming lines that are directly connected 
between systems or which are dedicated to calling other systems to 
choose a naming scheme that conveys the use of the line. In the 
earlier examples, the name ttyab is used for the line that directly 
connects two systems named a and b. Similarly, lines associated with 
calling units are best given names that relate them to the calling unit 
(note the names culO and cuaO to specify the line and calling unit, 
respectively). 


System File 

Each entry in this file represents a system that can be called by the 
local uucp programs. More than one line may be present for a 
particular system. In this case, the additional lines represent 
alternative communication paths that will be tried in sequential 
order. The fields are described below. 


system name Name of the remote system. 

time This is a string that indicates the days-of-week 

and times-of-day when the system should be 
called (e.g., MoTuTh0800-1730). 


The day portion may be a list containing Su, Mo, 
Tu, We, Th, Fr, Sa\ or it may be Wk for any 
week-day or Any for any day. The time should be 
a range of times (e.g., 0800-1230). If no time 
portion is specified, any time of day is assumed to 


10-6 



UUCP 


device 


class 


phone 


login 


be allowed for the call. Note that a time range 
that spans 0000 is permitted; 0800-0600 means all 
times are allowed other than times between 6 and 
8 am. An optional subfield is available to specify 
the minimum time (minutes) before a retry 
following a failed attempt. The subfield separator 
is a (e.g., Any, 9 means call any time but wait 
at least 9 minutes before retrying the call after a 
failure has occurred). 

This is either ACU or the hard-wired device name 
to be used for the call. For the hard-wired case, 
the last part of the special file name is used (e.g., 
ttyO). 

This is usually the line speed for the call (e.g., 
300). 

The phone number is made up of an optional 
alphabetic abbreviation (dialing prefix) and a 
numeric part. The abbreviation should be one 
that appears in the L-dialcodes file (e.g., mhl212, 
boston555-1212). For the hard-wired devices, this 
field contains the same string as used for the 
device field. 

The login information is given as a series of fields 
and subfields in the format 


[ expect send ] . . . 

where expect is the string expected to be read and 
send is the string to be sent when the expect 
string is received. 

The expect field may be made up of subfields of 
the form 

expect[-send-expect] . . . 


10-7 



UUCP 


where the send is sent if the prior expect is not 
successfully read and the expect following the 
send is the next expected string. (For example, 
login— login will expect login ; if it gets it, the 
program will go on to the next field; if it does not 
get login, it will send null followed by a new line, 
then expect login again.) If no characters are 
initially expected from the remote machine, the 
string " " (a null string) should be used in the 
first expect field. 

There are two special names available to be sent 
during the login sequence. The string EOT will 
send an EOT character, and the string BREAK 
will try to send a BREAK character. (The 
BREAK character is simulated using line speed 
changes and null characters and may not work on 
all devices and/or systems.) A number from 1 to 9 
may follow the BREAK (e.g., BREAK1, will send 1 
null character instead of the default of 3). Note 
that BREAK1 usually works best for 300-/1200- 
baud lines. 

There are several character strings that cause specific actions when 
they are a part of a string sent during the login sequence. 

\s Send a space character. 

\d Delay one second before sending or reading more 
characters. 

\c If at the end of a string, suppress the new-line that is 
normally sent. Ignored otherwise. 

\N Send a null character. 

These character strings are useful for making uucp communicate via 
direct lines to data switches. 

A typical entry in the L.sys file would be 


sys Any ACII 300 mh7654 login uucp ssword: word 


10-8 



UUCP 


The expect algorithm matches all or part of the input string as 
illustrated in the password field above. 


Dialing Prefixes 

This file contains the dial-code abbreviations used in the L.sys file 
(e.g., py, mh, boston). The entry format is 

abb dial-seq 

where abb is the abbreviation and dial-seq is the dial sequence to call 
that location. 

The line 

py 165- 

would be set up so that entry py7777 would send 165-7777 to the dial 
unit. 


Userfile 

This file contains user accessibility information. It specifies four 
types of constraints: 

1. Files that can be accessed by a normal user of the local machine. 

2. Files that can be accessed from a remote computer. 

3. Login name used by a particular remote computer. 

4. Whether a remote computer should be called back in order to 
confirm its identity. 

Each line in the file has the format 


login, sys [ c ] pathname [ pathname ] . . . 



UUCP 


where 

login is the login name for a user or the remote computer. 

sys is the system name for a remote computer. 

c is the optional call-back required flag. 

pathname is a pathname prefix that is acceptable for sys. 


The constraints are implemented as follows: 

1. When the program is obeying a command stored on the local 
machine, the pathnames allowed are those given on the first line 
in the USERFILE that has the login name of the user who 
entered the command. If no such line is found, the first line with 
a null login name is used. 

2. When the program is responding to a command from a remote 
machine, the pathnames allowed are those given on the first line 
in the file that has the system name that matches the remote 
machine. If no such line is found, the first one with a null 
system name is used. 

3. When a remote computer logs in, the login name that it uses 
must appear in the USERFILE. There may be several lines with 
the same login name but one of them must either have the name 
of the remote system or must contain a null system name. 

4. If the line matched in (3.) contains a “c”, the remote machine is 
called back before any transactions take place. 


The line 


u,m /usr/xyz 


allows machine m to login with name u and request the transfer of 
files whose names start with /usr/xyz. The line 

you, /usr/you 


10-10 



UUCP 


allows the ordinary user you to issue commands for files whose name 
starts with /usr/you. (This type restriction is seldom used.) The lines 

u,m /usr/xyz /usr/spool 
u, /usr/spool 

allows any remote machine to login with name u. If its system name 
is not m, it can only ask to transfer files whose names start with 
/usr/spool. If it is system m, it can send files from paths /usr/xyz 
as well as /usr/spool. The lines 

root, / 

, /usr 

allow any user to transfer files beginning with /usr but the user with 
login root can transfer any file. (Note that any file that is to be 
transferred must be readable by anybody.) 


Forwarding File 

There are two files that allow restrictions to be placed on the 
forwarding mechanism. The format of the entries in each file is the 
same, 


system 

or 

system,user,user2,... 


The file ORIG FILE ( / usr/lib/uucp/ O RIG FILE) restricts the access of 
systems that are attempting to forward through the local system. 
The file contains the list of systems (and users) for whom the local 
system is willing to forward. Each entry refers to the system that 
was the source of the original job and not the name of the last 
system to forward the file. The second file, FWDFILE 
(/ usr/lib/uucp/FWDFILE ), is a list of valid systems that a job can 
be forwarded to. (It is not necessarily the name of the destination of 
a job, but merely the next valid node.) This file will be a subset of the 



UUCP 


L.sys file and can be used to prevent forwarding to systems that are 
very expensive to reach but to which access by local users is allowed 
(e.g., links to overseas universities). If neither of these files exist, 
uucp will be perfectly happy to forward for any system. As an 
example, if the entry for system australia were in the ORIGFILE but 
not in the FWDFILE on system mhtsa, it would mean that system 
australia would be capable of forwarding jobs into the network via 
system mhtsa. However, no systems in the network could forward a 
job to australia via system mhtsa. 


ADMINISTRATION 

The role of the uucp administrator depends heavily on the amount of 
traffic that enters or leaves a system and the quality of the 
connections that can be made to and from that system. For the 
average system, only a modest amount of traffic (100 to 200 files per 
day) pass through the system and little if any intervention with the 
uucp automatic cleanup functions is necessary. Systems that pass 
large numbers of files (200 to 10,000) may require more attention 
when problems occur. The following parts describe the routine 
administrative tasks that must be performed by the administrator or 
are automatically performed by the uucp package. The part on 
problems describes what are the most frequent problems and how to 
effectively deal with them. 


Cleanup 

The biggest problem in a dialup network like uucp is dealing with the 
backlog of jobs that cannot be transmitted to other systems. The 
following cleanup activities should be routinely performed by shell 
scripts started from cron(l). 


Cleanup of Undeliverable Jobs 

The uudemon.day procedure usually contains an invocation of the 
uuclean command to purge any jobs that are older than some fixed 
time (usually 72 hours). A similar procedure is usually used to purge 
any lock or status files. An example invocation of uuclean(lM) to 
remove both job files and old status files every 48 hours is: 

/usr/lib/uucp/uuclean -pST -pC -n48 


10-12 



UUCP 


Cleanup of the Public Area 

In order to keep the local file system from overflowing when files are 
sent to the public area, the uudemon.day procedure is usually set 
up with a find command to remove any files that are older than 7 
days. This interval may need to be shortened if there is not 
sufficient space to devote to the public area. 


Compaction of Log Files 

The files SYSLOG and LOGFILE that contain logging information 
are compacted daily (using the pack command from the shell script 
uudemon.day) and should be kept for 1 week before being 
overwritten. 


Polling Other Systems 

Systems that are passive members of the network must be polled by 
other systems in order for their files to be sent. This can be 
arranged by using the uusub(l) command as follows: 


uusub -cmhtsd 


which will call mhtsd when it is invoked. 


Problems 

The following sections list the most frequent problems that appear on 
systems that make heavy use of uucp(l). 


Out of Space 

The file system used to spool incoming or outgoing jobs can run out 
of space and prevent jobs from being spawned or received from 
remote systems. The inability to receive jobs is the worse of the two 
conditions. When file space does become available, the system will be 
flooded with the backlog of traffic. 


10-13 



UUCP 


Bad ACU and Modems 

The ACU and incoming modems occasionally cause problems that 
make it difficult to contact other systems or to receive files. These 
problems are usually readily identifiable since LOGFILE entries will 
usually point to the bad line. If a bad line is suspected, it is useful to 
use the cu(l) command to try calling another system using the 
suspected line. 

Administrative Problems 

Some uucp networks have so many members that it is difficult to 
keep track of changing passwords, changing phone numbers, or 
changing logins on remote systems. This can be a very costly 
problem since ACU’s will be tied up calling a system that cannot be 
reached. 


DEBUGGING 

In order to verify that a system on the network can be contacted, the 
uucico daemon can be invoked from a user’s terminal directly. For 
example, to verify that mhtsd can be contacted, a job would be 
queued for that system as follows: 

uucp -r file mhtsd !~/tom 

The — r option forces the job to be queued but does not invoke the 
daemon to process the job. The uucico command can then be 
invoked directly: 

/ usr/lib/ uucp/ uucico -rl -x4 -smhtsd 

The -rl option is necessary to indicate that the daemon is to start 
up in master mode (i.e., it is the calling system). The — x4 specifies 
the level of debugging that is to be printed. Higher levels of 
debugging can be printed (greater than 4) but requires familiarity 
with the internals of uucico. If several jobs are queued for the 
remote system, it is not possible to force uucico to send one 
particular job first. The contents of LOGFILE should also be 
monitored for any error indications that it posts. Frequently, 
problems can be isolated by examining the entries in LOGFILE 
associated with a particular system. The file ERRLOG also contains 
error indications. 


10-14 



INTERCONNECTION 

SYSTEM A MEDIA SYSTEM B 


UUCP 



Figure 10-1. Uucp Network Daemon 




WORK L 1ST 


UUCP 



1 


Figure 10-2. Uucico Daemon Functional Blocks 








